How to enable Fast_start Failover?
FAST START FAILOVER CONFIGURATION:
DGMGRL> show fast_start failover
ASSIGNING FAILOVER TARGET:
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY FastStartFailoverTarget = 'sbistby';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY FastStartFailoverTarget = 'sbi';
Property "faststartfailovertarget" updated
DGMGRL> show fast_start failover
SETTING PROTECTION MODE:
DGMGRL> SHOW DATABASE 'sbi' 'LogXptStatus';
To Display the protection mode:
DGMGRL> show configuration
Enable maximum availability mode or maximum performance mode:
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Note:
1) If you cannot tolerate any loss of data, then ensure that the configuration protection mode is set to maximum availability. To do this, the LogXptMode database property for both the primary and target standby database must be set to SYNC.
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY LogXptMode=SYNC;
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY LogXptMode=SYNC;
2) If you can tolerate data loss, then we can go for maximum performance mode and set FastStartFailoverLagLimit. This property specifies the amount of data, in seconds, that the target standby database can lag behind the primary database in terms of redo applied.
or
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY LogXptMode=ASYNC;
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY LogXptMode=ASYNC;
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxPerformance;
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit=45;
FAST START FAILOVER CONFIGURATION PROPERTY:
Fast-start failover will occur if both the observer and the target standby database lose connection to the primary database for the period of time specified by the FastStartFailoverThreshold configuration property.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = '60';
Property "faststartfailoverthreshold" updated
Setting FastStartFailoverPmyShutdown:
If the FastStartFailoverPmyShutdown configuration property is set to TRUE, the primary database will shut down after FastStartFailoverThreshold seconds has elapsed if redo generation has been stalled and the primary database is unable to re-establish connectivity with either the observer or target standby database.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverPmyShutdown = 'TRUE';
Property "faststartfailoverpmyshutdown" updated
DGMGRL> show fast_start failover;
Setting FastStartFailoverAutoReinstate:
This configuration property causes the former primary database to be automatically reinstated if a fast-start failover was initiated because the primary database was either isolated or had crashed.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverAutoReinstate = 'TRUE';
Property "faststartfailoverautoreinstate" updated
Enable additional fast-start failover conditions:
Fast-start failover is done when both the observer and the standby cannot reach the primary after the configured time threshold (FastStartFailoverThreshold) has passed.
You can optionally indicate the database health conditions that should cause fast-start failover to occur.
######################
Below parameters are enable by default.
1) A datafile is offline because of a write error
2) Dictionary corruption of a critical database object
3) Control file damaged because of a disk error
4) LGWR is unable to write to any member of the log group because of an I/O error
5) Archive is unable to archive a redo log because the device is full or unavailable
6) Primary to observer and primary to standby network failure
7) An instance crash occurs (single instance)
8) All instances of a rac crash
9) Shutdown abort of primary
10) You can specify a error message, if you want to start the Fast start failover process. When I get ORA-xxxxx error is detected on the primary database with the following command:
DGMGRL> ENABLE FAST_START FAILOVER CONDITION xxxxx;
DGMGRL> enable fast_start failover condition "Inaccessible Logfile";
Succeeded.
DGMGRL> enable fast_start failover condition "Stuck Archiver";
Succeeded.
#####################
ENABLE FAST-START FAILOVER
DGMGRL> enable fast_start failover;
Enabled.
DGMGRL> show fast_start failover;
Start the Observer:
Must install DGMGRL on an observer computer (Not on the same DB server)
1) Install complete Oracle Client Administrator
2) Install a full db installation
1. PRE-REQS Must be in max availability or max performance
2. LogXptMode
LogXptMode must be in SYNC in max availability for 11g
LogXptMode must be in ASYNC in max performance for 11g
3. FLASHBACK DB must be enabled on primary and standby
4. tnsnames.ora must be configured on the observer
5. A static service name must exist so the observer can automatically restart databases.
You can start the observer before or after you enable fast-start failover. If fast-start failover is already enabled, the observer immediately begins monitoring the status and connections to the primary and target standby databases. If fast-start failover is not already enabled, the observer waits until fast-start failover gets enabled and then begins monitoring.
#!/bin/ksh
# startobserver
dgmgrl -logfile 11g_observer.log << eof
connect sys/password@sbi
START OBSERVER;
Eof
You can check the process in the unix side, whether the process is running
Ps –ef|grep filename
To view FAST START FAIL OVER INFORMATION (PRIMARY & STANDBY):
SQL> SELECT FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET, db_unique_name, FS_FAILOVER_THRESHOLD, FS_FAILOVER_OBSERVER_PRESENT,FS_FAILOVER_OBSERVER_HOST
FROM V$DATABASE;
To view the reason for the FAST START FAILOVER:
SELECT LAST_FAILOVER_TIME, LAST_FAILOVER_REASON FROM
V$FS_FAILOVER_STATUS;
#####################################
Configure the Fast-Start Failover database properties to meet the needs of your application. Available properties for Fast-Start Failover are as follows:
FastStartFailoverThreshold:
Used to specify the number of seconds to delay failover after the detection of a primary database failure. This parameter defaults to 30 seconds.
FastStartFailoverPmyShutdown:
In its default setting of true, this parameter causes the primary database to shut down when FastStartFailoverThreshold has been reached for a particular database. This parameter setting is ignored in the case of a user-configurable condition failover.
FastStartFailoverLagLimit:
Allows for definition of the number of seconds the standby database is able to fall behind. When this threshold is exceeded, automatic failover will not be allowed.
FastStartFailoverAutoReinstate:
When set to its default of true, this parameter enables the automatic reinstatement of a failed primary database as a standby. Automatic reinstatement is not possible for user-configurable failover conditions regardless of this parameter setting.
#####################################
Start the Observer on the designated server. The Observer should be run on a system within the same network segment that the application or application middle tiers run on to ensure that the Observer and the application have the same view of the database (in terms of connectivity). The requirement to run the Observer on a server other than the primary or standby systems is to have the 11g Release 2 RDBMS binaries or 11g Release 2 Administrator Client binaries on the system and to define tnsnames.ora entries for connectivity to the databases participating in the configuration. Multiple Observers can run from a single 11gRelease 2 (Admin Client or RDBMS) installation as long as the configuration file is uniquely identified on startup.
DGMGRL> start observer file='/tmp/fsfo_sbi.dat'
The Observer is a foreground process; therefore, control will not be returned to the user until the Observer has been stopped. For this reason, it is recommended that the Observer be run in the background and that its actions are logged to a file. On a Linux/Unix system, you can do the following:
# nohup dgmgrl -logfile /tmp/fsfo_mydg.log sys/password@sbi "start observer file='/tmp/fsfo_sbi.dat'" & (way to start the observer in the background)
How do we make the Observer resilient towards failure to enhance our MAA goals? The answer is Grid Control (or custom scripts if you are so inclined), which gives us out-of-the-box functionality to restart a failed Observer as well as the ability to failover an Observer to an alternate host. Should you not be running Grid Control to provide high availability for the Observer, don’t worry, because a failed Observer is not a show-stopper for your overall Oracle Data Guard configuration. In such a situation, the broker will report a ORA-16658 or ORA-16820, letting you (the MAA DBA) know that Fast-Start Failover is not operational and manual intervention is required.
DGMGRL> show fast_start failover
ASSIGNING FAILOVER TARGET:
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY FastStartFailoverTarget = 'sbistby';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY FastStartFailoverTarget = 'sbi';
Property "faststartfailovertarget" updated
DGMGRL> show fast_start failover
SETTING PROTECTION MODE:
DGMGRL> SHOW DATABASE 'sbi' 'LogXptStatus';
To Display the protection mode:
DGMGRL> show configuration
Enable maximum availability mode or maximum performance mode:
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxAvailability;
Note:
1) If you cannot tolerate any loss of data, then ensure that the configuration protection mode is set to maximum availability. To do this, the LogXptMode database property for both the primary and target standby database must be set to SYNC.
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY LogXptMode=SYNC;
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY LogXptMode=SYNC;
2) If you can tolerate data loss, then we can go for maximum performance mode and set FastStartFailoverLagLimit. This property specifies the amount of data, in seconds, that the target standby database can lag behind the primary database in terms of redo applied.
or
DGMGRL> EDIT DATABASE 'sbi' SET PROPERTY LogXptMode=ASYNC;
DGMGRL> EDIT DATABASE 'sbistby' SET PROPERTY LogXptMode=ASYNC;
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MaxPerformance;
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit=45;
FAST START FAILOVER CONFIGURATION PROPERTY:
Fast-start failover will occur if both the observer and the target standby database lose connection to the primary database for the period of time specified by the FastStartFailoverThreshold configuration property.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = '60';
Property "faststartfailoverthreshold" updated
Setting FastStartFailoverPmyShutdown:
If the FastStartFailoverPmyShutdown configuration property is set to TRUE, the primary database will shut down after FastStartFailoverThreshold seconds has elapsed if redo generation has been stalled and the primary database is unable to re-establish connectivity with either the observer or target standby database.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverPmyShutdown = 'TRUE';
Property "faststartfailoverpmyshutdown" updated
DGMGRL> show fast_start failover;
Setting FastStartFailoverAutoReinstate:
This configuration property causes the former primary database to be automatically reinstated if a fast-start failover was initiated because the primary database was either isolated or had crashed.
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverAutoReinstate = 'TRUE';
Property "faststartfailoverautoreinstate" updated
Enable additional fast-start failover conditions:
Fast-start failover is done when both the observer and the standby cannot reach the primary after the configured time threshold (FastStartFailoverThreshold) has passed.
You can optionally indicate the database health conditions that should cause fast-start failover to occur.
######################
Below parameters are enable by default.
1) A datafile is offline because of a write error
2) Dictionary corruption of a critical database object
3) Control file damaged because of a disk error
4) LGWR is unable to write to any member of the log group because of an I/O error
5) Archive is unable to archive a redo log because the device is full or unavailable
6) Primary to observer and primary to standby network failure
7) An instance crash occurs (single instance)
8) All instances of a rac crash
9) Shutdown abort of primary
10) You can specify a error message, if you want to start the Fast start failover process. When I get ORA-xxxxx error is detected on the primary database with the following command:
DGMGRL> ENABLE FAST_START FAILOVER CONDITION xxxxx;
DGMGRL> enable fast_start failover condition "Inaccessible Logfile";
Succeeded.
DGMGRL> enable fast_start failover condition "Stuck Archiver";
Succeeded.
#####################
ENABLE FAST-START FAILOVER
DGMGRL> enable fast_start failover;
Enabled.
DGMGRL> show fast_start failover;
Start the Observer:
Must install DGMGRL on an observer computer (Not on the same DB server)
1) Install complete Oracle Client Administrator
2) Install a full db installation
1. PRE-REQS Must be in max availability or max performance
2. LogXptMode
LogXptMode must be in SYNC in max availability for 11g
LogXptMode must be in ASYNC in max performance for 11g
3. FLASHBACK DB must be enabled on primary and standby
4. tnsnames.ora must be configured on the observer
5. A static service name must exist so the observer can automatically restart databases.
You can start the observer before or after you enable fast-start failover. If fast-start failover is already enabled, the observer immediately begins monitoring the status and connections to the primary and target standby databases. If fast-start failover is not already enabled, the observer waits until fast-start failover gets enabled and then begins monitoring.
#!/bin/ksh
# startobserver
dgmgrl -logfile 11g_observer.log << eof
connect sys/password@sbi
START OBSERVER;
Eof
You can check the process in the unix side, whether the process is running
Ps –ef|grep filename
To view FAST START FAIL OVER INFORMATION (PRIMARY & STANDBY):
SQL> SELECT FS_FAILOVER_STATUS,FS_FAILOVER_CURRENT_TARGET, db_unique_name, FS_FAILOVER_THRESHOLD, FS_FAILOVER_OBSERVER_PRESENT,FS_FAILOVER_OBSERVER_HOST
FROM V$DATABASE;
To view the reason for the FAST START FAILOVER:
SELECT LAST_FAILOVER_TIME, LAST_FAILOVER_REASON FROM
V$FS_FAILOVER_STATUS;
#####################################
Configure the Fast-Start Failover database properties to meet the needs of your application. Available properties for Fast-Start Failover are as follows:
FastStartFailoverThreshold:
Used to specify the number of seconds to delay failover after the detection of a primary database failure. This parameter defaults to 30 seconds.
FastStartFailoverPmyShutdown:
In its default setting of true, this parameter causes the primary database to shut down when FastStartFailoverThreshold has been reached for a particular database. This parameter setting is ignored in the case of a user-configurable condition failover.
FastStartFailoverLagLimit:
Allows for definition of the number of seconds the standby database is able to fall behind. When this threshold is exceeded, automatic failover will not be allowed.
FastStartFailoverAutoReinstate:
When set to its default of true, this parameter enables the automatic reinstatement of a failed primary database as a standby. Automatic reinstatement is not possible for user-configurable failover conditions regardless of this parameter setting.
#####################################
Start the Observer on the designated server. The Observer should be run on a system within the same network segment that the application or application middle tiers run on to ensure that the Observer and the application have the same view of the database (in terms of connectivity). The requirement to run the Observer on a server other than the primary or standby systems is to have the 11g Release 2 RDBMS binaries or 11g Release 2 Administrator Client binaries on the system and to define tnsnames.ora entries for connectivity to the databases participating in the configuration. Multiple Observers can run from a single 11gRelease 2 (Admin Client or RDBMS) installation as long as the configuration file is uniquely identified on startup.
DGMGRL> start observer file='/tmp/fsfo_sbi.dat'
The Observer is a foreground process; therefore, control will not be returned to the user until the Observer has been stopped. For this reason, it is recommended that the Observer be run in the background and that its actions are logged to a file. On a Linux/Unix system, you can do the following:
# nohup dgmgrl -logfile /tmp/fsfo_mydg.log sys/password@sbi "start observer file='/tmp/fsfo_sbi.dat'" & (way to start the observer in the background)
How do we make the Observer resilient towards failure to enhance our MAA goals? The answer is Grid Control (or custom scripts if you are so inclined), which gives us out-of-the-box functionality to restart a failed Observer as well as the ability to failover an Observer to an alternate host. Should you not be running Grid Control to provide high availability for the Observer, don’t worry, because a failed Observer is not a show-stopper for your overall Oracle Data Guard configuration. In such a situation, the broker will report a ORA-16658 or ORA-16820, letting you (the MAA DBA) know that Fast-Start Failover is not operational and manual intervention is required.
Configuring Transparent Application Failover (TAF) for Fast Start Fail Over (FSFO)
Client Side
Tnsnames.ora
SALES=
(DESCRIPTION_LIST=
(LOAD_BALANCE=off)(FAILOVER=on)
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=Chicago)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=OrderEntry)))
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=Texas)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=OrderEntry))))
Server Side
Service created on both primary and standby must enable TAF at server side
Configure DELAY and RETRY parameters
Primary Side:
srvctl add service -d sbi -s orderentry -r chicago -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150
Standby Side:
srvctl add service -d sbistby -s orderentry -r texas -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150
Tnsnames.ora
SALES=
(DESCRIPTION_LIST=
(LOAD_BALANCE=off)(FAILOVER=on)
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=Chicago)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=OrderEntry)))
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=Texas)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=OrderEntry))))
Server Side
Service created on both primary and standby must enable TAF at server side
Configure DELAY and RETRY parameters
Primary Side:
srvctl add service -d sbi -s orderentry -r chicago -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150
Standby Side:
srvctl add service -d sbistby -s orderentry -r texas -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150
How to configure TAF at serverside
Fast Start Failover (FSF)
Needs an observer process on a separate machine to ensure that the failover is initiated in a appropriate manner. e.g. If a standby lost connectivity to the primary due to a network issue, rather than because the primary has failed.
Client Connections
Role based services is available in Grid Infrastructure.
This is where you create various services for different database roles. So a "production" service will only be running and accepting connections when the database role is "primary" e.g.
srvctl add service -d <db name> -s <service name> -l PRIMARY
The services are created with srvctl.
Prior to 11gr2, this could only be performed using dbms_service and triggers. This not recommended for 11gr2. However, it does mean that grid infrastructure needs to be installed if triggers and dbms_service is not going to be used.
How to Configure Services for Client Failover
High Level Steps
1. Create production service on primary node
2. Start service on production node
3. Ensure standby database is know to srvctl
4. Create production service on standby node.
5. Create production service entry in tnsnames on primary and standby nodes
Notes
1. The production service does not get started on the standby database as it cannot be started on the standby node as it is already running on the production node. At failover time the service will be moved to the standby node.
Step 1
a) Create production service
srvctl add service -d sbi -s sbiprimary -l PRIMARY -m BASIC -e SELECT -w 1 -z 180
Step 2
a) Start service
srvctl start service -d sbi -s sbiprimary
Step 3
a) Add standby database to the srvctl on standby node.
srvctl add database -d sbistby -o /u01/app/oracle/product/11.2.0/dbhome_1 -m us.oracle.com -p /u01/app/oracle/product/11.2.0/dbhome_1/dbs/spfilepc10sby1.ora -r PHYSICAL_STANDBY -a "SBDAT,SBFRA"
srvctl config database -d sbistby
Step 4
a) Create the production service on the standby node
srvctl add service -d sbistby-s sbiprimary -l PRIMARY -m BASIC -e SELECT -w 1 -z 180
Step 5
a) Create the service entries in the tnsnames on both nodes
Add service entry on the primary node
sbipriamry =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP) (HOST = chicago)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP) (HOST = texas)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = sbiprimary)
)
Add service entry on the standby node
sbiprimary =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP) (HOST = chicago)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP) (HOST = texas)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = sbiprimary)
)
Needs an observer process on a separate machine to ensure that the failover is initiated in a appropriate manner. e.g. If a standby lost connectivity to the primary due to a network issue, rather than because the primary has failed.
Client Connections
Role based services is available in Grid Infrastructure.
This is where you create various services for different database roles. So a "production" service will only be running and accepting connections when the database role is "primary" e.g.
srvctl add service -d <db name> -s <service name> -l PRIMARY
The services are created with srvctl.
Prior to 11gr2, this could only be performed using dbms_service and triggers. This not recommended for 11gr2. However, it does mean that grid infrastructure needs to be installed if triggers and dbms_service is not going to be used.
How to Configure Services for Client Failover
High Level Steps
1. Create production service on primary node
2. Start service on production node
3. Ensure standby database is know to srvctl
4. Create production service on standby node.
5. Create production service entry in tnsnames on primary and standby nodes
Notes
1. The production service does not get started on the standby database as it cannot be started on the standby node as it is already running on the production node. At failover time the service will be moved to the standby node.
Step 1
a) Create production service
srvctl add service -d sbi -s sbiprimary -l PRIMARY -m BASIC -e SELECT -w 1 -z 180
Step 2
a) Start service
srvctl start service -d sbi -s sbiprimary
Step 3
a) Add standby database to the srvctl on standby node.
srvctl add database -d sbistby -o /u01/app/oracle/product/11.2.0/dbhome_1 -m us.oracle.com -p /u01/app/oracle/product/11.2.0/dbhome_1/dbs/spfilepc10sby1.ora -r PHYSICAL_STANDBY -a "SBDAT,SBFRA"
srvctl config database -d sbistby
Step 4
a) Create the production service on the standby node
srvctl add service -d sbistby-s sbiprimary -l PRIMARY -m BASIC -e SELECT -w 1 -z 180
Step 5
a) Create the service entries in the tnsnames on both nodes
Add service entry on the primary node
sbipriamry =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP) (HOST = chicago)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP) (HOST = texas)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = sbiprimary)
)
Add service entry on the standby node
sbiprimary =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP) (HOST = chicago)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP) (HOST = texas)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = sbiprimary)
)