Oracle GoldenGate is an Heterogeneous replication tool. It is very easy to install and configure Oracle GoldenGate. The real challenge comes when the Processes gets ABEND. Sometimes it is easy to detect problems, but sometimes we will be really not knowing how to proceed or approach to solve or troubleshoot the issue.
This article explains,
1. Levels of Failure in Oracle GoldenGate.
2. The approach to Troubleshoot Oracle GoldenGate.
3. How to identify the issue.
4. What are the files to be looked for Troubleshooting Oracle GoldenGate.
5. Tools to Monitor and Troubleshoot Oracle GoldenGate.
1. LEVELS OF FAILURE
Oracle GoldenGate can Abend or Fail at different levels. There might be many reasons for the Oracle GoldenGate Process failures. The different levels at which the Oracle GoldenGate processes fails or Abends are .,
1. Database Level
2. Network Level
3. Storage Level
4. User Level
a. DATABASE LEVEL OF FAILURE
Oracle GoldenGate also fails if you have issues at the Database Level. Below are some of the issues listed.,
Tablespace filled Redo log corruption Archive log destination filled No Primary Key or Unique Index on the tables Archive log Mode not enabled Reset Log performed Memory Problem with Streams_Pool_Size Database Hung
b. NETWORK LEVEL OF FAILURE
Network plays a vital role in the Oracle GoldenGate Replication. For each and every commands you execute in the GGSCI prompt, the Manager Process opens a port. There should be a proper, speedy network between the Source and Target sides. Some of the Network level failures are listed below.,
Network Fails Network slow Ports Unavailability Firewall Enabled
c. STORAGE LEVEL OF FAILURE
There should be sufficient storage space available for the Oracle GoldenGate to keep the Trail files. Even at Oracle Database level, there should be sufficient space to retain the Archive Log files and also space for tablespaces. Proper privileges should be given to the file system so that, Oracle GoldenGate Processes creates the trail files in the location.
File System fills File System corruption No Proper privileges given to the File System Connection Problem between Database Server and Storage No Free Disks Available in Storage
d. USER LEVEL OF FAILURE
Of course, we users make some mistakes. Some of the user level of failures are below.,
Mistakenly Delete the GoldenGate Admin User at Database level. Manually Performing Operations like Insert, Delete and Update at Target Side. Manually deleting / removing the Trail Files either from Source server or Target server. Forcefully Stopping any Oracle GoldenGate Processes like Manager, Extract, Pump, Collector or Replicat. Killing the Oracle GoldenGate Processes at OS level. Performing an ETROLLOVER at Extract / Pump / Replicat Processes.
So we have seen the different levels of Failures in Oracle GoldenGate. How to proceed if you face these failures in your day to day life. What is the approach to identify the issue and solve it.
2. HOW TO APPROACH?
The below are the steps on how to approach to the problem. If the environment is a known one, then you can skip some of the steps.
Learn and Understand the Environment Operating Provider and Operating System Version Database Provider and Database Version Is it a Cluster, Active / Passive? Oracle GoldenGate UniDirectional or Bi-Directional If Oracle, then is it a Single Instance or RAC – Real Application Clusters Is it a Homogeneous or Heterogeneous Environment Replication Network Flow, Ports Used and Firewalls configured Components used in Oracle GoldenGate like Extract, Pump, Replicat processes and Trails files.
After seeing all the prerequisites like Environment study etc, check if the Processes are up and running. INFO ALL is the command to check the status of the processes. There are different status of process.
The Process has started and running normally.
The Process has stopped either normally (Controlled Manner) or due to an error.
The Process is starting.
The Process has been stopped in an uncontrolled manner. Abnormal End is known was ABEND.
From the above status of the Processes status, RUNNING, STOPPED and ABENDED are common. But what is STARTING? What actually happens when the Oracle GoldenGate process is in this state?
Whenever you start an Abended Extract Process, it will take some time to get started. It is because, the process is getting recovered from its last Abend point. To recover it’s processing state, the Extract Process search back to its Online Redo Log file or Archive log file, to find the first log record for the opened transactions when it is crashed. The more back the Extract Process goes in search, the more it takes to recover itself and get started. So, It takes more time depending upon how long back the Open transaction is in the Redo Logs or Archive Logs.
To check the status of the Extract Process and also to check if it is recovering properly, issue the command.,
THREE BASIC FILES
There are many files which needs to be checked whenever you face issue in Oracle GoldenGate. Out of which Oracle GoldenGate logs the activity in three files.
1. Error Log File – ggserr.log
2. Report File
3. Discard File
4. Trace File
5. Alert Log File
6. DDL Trace File
The first three are the very basic, also can be called as major files which are to be looked in to whenever there are problems in the Oracle GoldenGate. Below, is the explanation for these three files.
What is Error Log file – ggserr.log?
This file is created during the Installation of the Oracle GoldenGate. The file is created in the Oracle GoldenGate home directory
Start and Stop of the Oracle GoldenGate Processes.
Processing Information like Bounded Recovery operations.
Informational messages like normal operations happening in Oracle GoldenGate.
WARNING Messages like Long Running Transactions.
Commands executed in GGSCI Prompt.
The format in which the Oracle GoldenGate processes logs the information in to this ggserr.log file is below.,
You can view this file in the ggsci prompt itself by using the command VIEW GGSEVT. But it is always better to view it using the OS tool as this file can grow a lot. The below is the example.,
So with the ggserr.log file you basically identify the below.,
What is the Error?
When the Error occurred?
How Frequently it occurred?
What were the operations performed before the Error occurred?
How Frequently the error occurred?
What is Report File?
A Report file is a process specific log file. Each process has its own report file created and this file is created during the instantiation of the process. This file is stored in the directory
Let’s consider a process called EXT and the report file during instantiation of this process is called as EXT.rpt. If this process is stopped and started again, existing file EXT.rpt will be automatically renamed to EXT0.rpt and a new file will be generated with the name EXT.rpt and this occurs recursively till the value of the sequence reaches 9. If the last report file name for the process EXT is created as EXT9, now during the new file generation, the last file EXT9.rpt will be removed and EXT8.rpt will be renamed as EXT9.rpt. So, the report file with the lower sequence value will be the latest and younger one when compared with older sequence valued report file.
REPORTROLLOVER parameter is used to manually or forcefully create a new report file for the processes. To view the current report of the process the below command is used.,
To get the runtime statistics report of a process, use the below command,
The below information can be seen in the report file of a particular process.,
Oracle GoldenGate Product Version and Release Operating Version, Release, Machine Type, Hostname and Ulimit settings of the respective process Memory Utilized by the respective process Configured Parameters of the respective Oracle GoldenGate Process Database Provider, Version and Release Trail files Information Mapping of Tables Informational messages with respective to a particular process Warning messages with respective to a particular process Error messages with respective to a particular process All DDL Operations performed. All the Discarded Errors and Ignored Operations Crash dumps Any commands which are performed on that particular process.
The below is the example of the Report file which I had split it to many parts so that you will get an clear understanding.
1. Oracle GoldenGate Product Version and Release. Operating Version, Release, Machine Type, Hostname and Ulimit settings of the respective process
2. Configured Parameters of the respective Oracle GoldenGate Process
3. Database Provider, Version, Release and Trail File information.
4. Mapping of tables and Informational messages with respect to the Process.
5. Crash dump and Error messages of the respective process.
Above examples clearly shows the contents of a Report file. So with the help of a Report file, the following can be known,
In which Trail File the Process gets Abend.
Whether the Trail File is moving forward?
Whether the process is getting failed with Same Trail File?
What operations has been performed before the process abend?
Whether any errors in the Parameter configuration?
Whether the MAP statements has the correct table names?
What is Discard File?
A log file for logging failed operations of the Oracle GoldenGate processes. It is mainly used for Data errors. In Oracle GoldenGate 11g, this file is not created by default. We have to mention a keyword DISCARDFILE to enable discard file logging. But from Oracle GoldenGate 12c, this file is generated by default during the instantiation of the process.
The Naming format of the Log file is
PURGE and APPEND keywords are used in the process parameter files to manually maintain the Discard File. Similar to the Report file, the Discard file can also be rolled over using the keyword DISCARDFILEROLLOVER. The syntax is as below.,
The relative or fully qualified name of the discard file, including the actual file name.
Adds new content to existing content if the file already exists.
Purges the file before writing new content.
MAXBYTESn | MEGABYTESn
File size in Bytes. For file size in bytes the valid range is from 1 to 2147483646. The default is 50000000. For file size in megabytes the valid range is from 1 to 2147. The default size is 50MB. If the specified size is exceeded, the process Abends.
When using this parameter, there will be no discard file creation. It prevents generating the Discard file.
The below is the example for the Discard file parameter used in the Replicat process parameter file.,
The Discard File is mainly used in the Target Side. Each and Every Replicat Process should have its own Discard File. This is a mandatory one.
The below is the example which shows the contents of the Discard file. The Replicat process got Abended due to the error OCI Error ORA-01403 : no data found. The discard file is as below.,
So, we have seen about the three basic and important file where Oracle GoldenGate Processes logs the information. There is also a tool which is used to troubleshoot Oracle GoldenGate during Data corruption or trail file corruption. This is mainly used when Data error occurs in the Oracle GoldenGate.
The tool is called LOGDUMP. It is a very useful tool which allows a user to navigate through the trail file and compare the information of the trail file with the data extracted and replicated by the processes. The below can be seen in the trail file using the LOGDUMP utility.,
Operation type and Time when the Record written.
Source Object name
Image type, whether it is a Before Image or After Image.
Column information with data and sequence information.
Record length, Record data in ASCII format.
The below is the example of the contents of the Trail File.,
Some of the Logdump commands with the description are below., To get in to the logdump prompt, just run the logdump program from the Oracle GoldenGate Home directory.
Logdump 1> GHDR ON – To view the Record Header.
Logdump 2> DETAIL ON – To view the column information.
Logdump 3> DETAIL DATA – To view the Hex and ASCII values of the Column.
Logdump 4> USERTOKEN ON – User defined information specified in the Table of Map statements. These information are stored in the Trail file.
Logdump 4> GGSTOKEN ON – Oracle GoldenGate generated tokens. These tokens contains the Transaction ID, Row ID etc.,
Logdump 5> RECLEN length – Manually control the length of the record.
Logdump 6> OPEN file_name – To open a Trail file.
Logdump 7> NEXT – To move to the next File record. In short, you can use the letter N.
Logdump 8> POS rba – To position to a particular RBA.
Logdump 9> POS FIRST – To go to the position of the first record in the file.
Logdump 10> POS 0 – This is the alternate command for the POS FIRST. Either of this can be used.
Logdump 11> SCANFORENDTRANS – To go to the end of the transaction.
Logdump 12> HELP – To get the online help.
Logdump 13> EXIT – To exit from the Logdump prompt. You can also use QUIT alternatively.
Hope you got a clear view on how to approach to a Oracle GoldenGate problem and also find who stopped the Oracle GoldenGate process and the reason behind it.