Rebuild of standby using incremental backup of primary
I have a long to-do list of things I want to test out and one is rebuilding a standby by using an incremental backup from primary. My colleague who had recently been faced with the problem when a customer relocated two Primary 2-node RACs and a single node standby databases to a new location and just happened to start the standby databases in read-only mode. Vitaly tells the story :-
As you may know, the read-only mode will prevent any redo logs being applied to standby database, but on the surface everything looks OK – no errors and MRP0 process is running and showing “applying log” in v$managed_standby.
The only problem is – the recovery is “stuck” on the last log the database was trying to apply before it was opened in read-only mode.
Unfortunately, the customer did not notice the omission for over 2 weeks and by the time I have had a chance to look at the environments, there were about 50G of redo logs accumulated for each and some of them were missing and/or corrupt which excluded the possibility of SCPing the logs over to the standby server and registering them with standby databases.
One of another factors which caused a lack of attention to the standby databases falling behind is the absence of any error messages in the alert logs – every single log was shown as shipped and received.
In a case like this, the only option is to rebuild the standby database and in the past I did it using the traditional RMAN duplicate for standby routine. However, in this particular case I had 2 databases to rebuild – one is small and another is large. The network between primary cluster and standby was slow as well.
For the small database I decided to use Grid Control GUI based wizard for creating the standby database and this process is quite straightforward and described in the documentation. For the large one, however, duplicating the database using RMAN would be too slow,there may have been a performance degradation during the run and the maintenance window was too short for out-of-hours run.
This was a perfect case to try “incremental backup” approach. This method is described in a number of sources (if you Google it) but none of the “vanilla” cases worked for me.
I will not be listing the names and detailed output due to the production nature of the work – just the list of steps.
So, this is what I did at the end of the day:
PRE-REQUISITES:
Primary database can be single node or RAC and running OK.
No downtime of Primary is required.
All Dataguard settings are intact
Step-by-step:
- 1. Get the latest SCN from standby:
1 2 3 | select to_char(current_scn) from v$database; 10615562421 |
- 2. Create incremental backup on Primary for all the changes since SCN on standby:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [oracle@primary backup]$ rman target /</pre> connected to target database: PRIMARY (DBID=720063942) RMAN> run 2> { 3> allocate channel d1 type disk; 4> allocate channel d2 type disk; 5> backup incremental from scn 10615562421 database format 6> '/tmp/backup/primary_%U'; 7> release channel d1; 8> release channel d2; <pre>9> } |
- 3. Create copy of control file on Primary:
1 | alter database create standby controlfile as ‘/tmp/backup/stby.ctl’; |
- 4. SCP the backup files and standby control file to the standby server. A little tip: if you copy the backup files to the directory with the same name (like /tmp/backup here), your controlfile will know about them and you can bypass the registration bit later on.
- 5. The next step is to replace the standby control file with the new one. May sound simple, but this proved to be the trickiest part due to the fact that standby controlfile is OMF and in ASM. You will need to use RMAN for the restore operation:
– Switch the database to nomount, then:
1 | restore controlfile from ‘/tmp/backup/stby.ctl'; |
– Mount the database.
At this point you have the controlfile with the information about the files as they are on the Primary side, so, the next step is to register everything we have on Standby side:
1 | catalog start with '+data/standby/'; |
Check the output and YES to register any reported standby files.
– Shutdown immediate your standby instance.
1 2 3 | RMAN> switch database to copy; RMAN> report schema; |
On this stage you should have a nice and clean list of actual standby files.
Now we are ready to apply our differential backup to bring the standby in line with Primary:
1 | RMAN> recover database noredo; |
Because the online redo logs are lost, you must specify the NOREDO
option in the RECOVER
command.
You must also specify NOREDO
if the online logs are available but the redo cannot be applied to the incrementals.
If you do not specify NOREDO
, then RMAN searches for redo logs after applying the incremental backup, and issues an error message when it does not find them.
When the recovery completes, you may start the managed recovery process again:
1 | SQL> alter database recover managed standby database using current logfile disconnect; |
Provided, all FAL settings are correct, your managed recovery will pick-up all logs generated on primary since the incremental backup and you will have fully synchronised standby again.
Comments
Post a Comment