Understanding Oracle Database High Availability (HA) Solutions

  RoadToMaster –  Oracle Database offers an integrated suite of high availability solutions that increase availability and eliminate or minimize both planned and unplanned downtime. These solutions also help enterprises maintain business continuity 24 hours a day, seven days a week.

1.  Oracle Real Application Clusters (RAC)

Oracle Real Application Clusters (RAC) allows Oracle  Database to run any packaged or custom application unchanged across a set of clustered servers. This capability provides the highest levels of availability and the most flexible scalability. If a clustered server fails, Oracle Database continues running on the surviving servers. When more processing power is needed, you can add another server without interrupting access to data.

Oracle RAC enables multiple instances that are linked by an interconnect to share access to an Oracle database. In an Oracle RAC environment, Oracle Database runs on two or more systems in a cluster while concurrently accessing a single shared database. The result is a single database system that spans multiple hardware systemsyet appears as a single unified database system to the application. This enables Oracle RAC to provide high availability, scalability, and redundancy during failures within the cluster. Oracle RAC accommodates all system types, from read-only data warehouse (DSS) systems to update-intensive online transaction processing (OLTP) systems.

High availability configurations have redundant hardware and software that maintain operations by avoiding single points-of-failure. To accomplish this, the Oracle Clusterware is installed as part of the Oracle RAC installation process. Oracle Clusterware is a portable solution that is integrated and designed specifically for Oracle Database. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component. Other non-Oracle processes can also be managed by Oracle Clusterware. During outages, Oracle Clusterware relocates the processing performed by the inoperative component to a backup component. For example, if a node in the cluster fails, Oracle Clusterware causes client processes running on the failed node to reconnect and resume running on a surviving node.

The Oracle Clusterware requires two files, the Oracle Cluster Registry (OCR) and the voting disk. To avoid single points-of-failure, the Oracle Clusterware automatically maintains redundant copies of these files. Oracle Clusterware also enables you to replace a damaged copy of the OCR online. Oracle’s recovery processes quickly re-master resources, recover partial or failed transactions, and rapidly restore the system.

Oracle RAC provides the following benefits:
Ability to tolerate and quickly recover from computer and instance failures.
Fast, automatic, and intelligent connection and service relocation and failover.
Rolling patch upgrades for qualified one-off patches.
Rolling release upgrades of Oracle Clusterware.
Load balancing advisory.
Runtime connection load balancing.
Flexibility to increase processing capacity using commodity hardware without downtime or changes to the application.
Comprehensive manageability integrating database and cluster features.

2. Oracle Data Guard

  Oracle Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive failures, disasters, errors, and data corruption. Data Guard maintains these standby databases as transactionally consistent copies of the production database. If the production database becomes unavailable due to a planned or an unplanned outage, then Data Guard can switch any standby database to the production role, thus greatly reducing the downtime caused by the outage. The failover of data processing from the production to the standby database can be completely automatic and done without any human intervention, thereby reducing the management costs associated with the Data Guard configuration. Data Guard can be used with traditional backup, restore, and clustering solutions to provide a high level of data protection and data availability.

A Data Guard configuration consists of one production database and one or more physical or logical standby databases. The databases in a Data Guard configuration are connected by Oracle Net and may be dispersed geographically. There are no restrictions on where the databases are located if they can communicate with each other. For example, you can have a standby database in the same building as your primary database to help manage planned downtime and two or more standby databases in other locations for use in disaster recovery.

Oracle Data Guard provides the following benefits:
Maintenance of real-time, transactionally consistent database copies to provide protection against unplanned downtime and disaster.
Data protection against computer failures, human errors, data corruption, and site failures.
Reduction of planned downtime for hardware and system upgrades, and Oracle patch set and database upgrades.
Detection and resolution of missing data automatically following temporary loss of connectivity between the primary and standby database.
Multiple levels of data protection and performance to balance data availability against system performance requirements.
Efficient use of system resources by diverting reporting and backup operations from the production database to standby databases.
Divergence of a Redo Apply standby database for reporting or testing purposes and resynchronization with primary database once complete.
Managed and automatic role transition and application notification to minimize planned and unplanned downtime.
Automatic resynchronization of a failed primary database following a failover Management of all systems as a single configuration for simplified administration.

3.  Oracle Streams

  Oracle Streams enables the propagation and management of data, transactions, and events in a data stream, either within a database or from one database to another. Streams provides a set of elements that enables you to control what information is put into a data stream, how the stream is routed from node to node, what happens to events in the stream as they flow into each node, and how the stream terminates. You can use Oracle Streams to replicate a database or a subset of a database. You can also update data at multiple locations simultaneously. If a failure occurs at one of the locations, then surviving sites can continue to access and update data.

You can use Oracle Streams to build distributed applications that replicate changes at the application level using message queuing. If an application fails, then the surviving applications can continue to operate and provide access to data through locally maintained copies.

Oracle Streams provides granularity and control over what is replicated and how it is replicated. It supports bidirectional replication, data transformations, subsetting, custom apply functions, and heterogeneous platforms. It also gives you complete control over the routing of change records from the primary database to a replica database.

As with Oracle Streams, Oracle Data Guard in SQL Apply mode can capture database changes, propagate them to destinations, and apply the changes at these destinations. Although Oracle Streams and Data Guard in SQL Apply mode share much of the same underlying technologies for high availability, Data Guard in SQL Apply mode is easier to implement and manage than a high-availability solution based on Oracle Streams.

Oracle Streams provides the following benefits:
Data protection by maintaining a full or partial remote copy of the database.
Achieves little or no downtime during database upgrade or maintenance operations such as migrating a database to a different platform or character set, modifying database objects to support upgrades to applications, and applying an Oracle software patch.
Data replication by capturing DML and DDL changes made to database objects and replicating these changes to one or more other databases.
Event management and notification by enqueuing messages or capturing events, propagating the messages and events through queues, and dequeuing and applying or acting upon the message or event.
Supports heterogeneous platforms across databases within the configuration.
Allows character sets to differ between replicas.
Permits fine-grained control of data sharing.

4. Automatic Storage Management (ASM)

Automatic Storage Management (ASM) provides a vertically integrated file system and volume manager directly in the Oracle kernel, resulting in:
■ Significantly less work to provision database storage.
■ Higher level of availability.
■ Elimination of the expense, installation, and maintenance of specialized storage products.
■ Unique capabilities for database applications.

For optimal performance, ASM spreads files across all available storage. To protect against data loss, ASM extends the concept of SAME (stripe and mirror everything) and adds more flexibility in that it can mirror at the database file level rather than the entire disk level.

 More importantly, ASM simplifies the processes of setting up mirroring, adding disks, and removing disks. Instead of managing hundreds and possibly thousands of files (as in a large data warehouse), DBAs using ASM create and administer a larger-grained object called a disk group. The disk group identifies the set of disks that are managed as a logical unit. Automation of file naming and placement of the underlying database files save DBAs time and ensures adherence to standard best practices.

The ASM native mirroring mechanism (2-way or 3-way) is an option that protects against storage failures. With ASM mirroring, an additional level of data protection can be provided with the use of failures groups. A failure group is a set of disks sharing a common resource (disk controller or an entire disk array) whose failure can be tolerated. Once defined, an ASM failure group intelligently places redundant copies of the data in separate failure groups. This ensures that the data is available and transparently protected against the failure of any component in the storage subsystem.

ASM provides the following benefits:
Provides the ability to mirror across drives and storage arrays
Automatically re-mirrors from a failed drive to remaining drives
Automatically rebalances stored data when disks are added or removed while the database remains online
Allows for operational simplicity in managing a database storage grid

[continue..]

Leave a comment