Approach to Troubleshoot an Abended OGG Process

Oracle GoldenGate is an Heterogeneous replication tool. It is very easy to install and configure Oracle GoldenGate. The real challenge comes when the Processes gets ABEND. Sometimes it is easy to detect problems, but sometimes we will be really not knowing how to proceed or approach to solve or troubleshoot the issue.

This article explains,

1. Levels of Failure in Oracle GoldenGate.
2. The approach to Troubleshoot Oracle GoldenGate.
3. How to identify the issue.
4. What are the files to be looked for Troubleshooting Oracle GoldenGate.
5. Tools to Monitor and Troubleshoot Oracle GoldenGate.

1. LEVELS OF FAILURE

Oracle GoldenGate can Abend or Fail at different levels. There might be many reasons for the Oracle GoldenGate Process failures. The different levels at which the Oracle GoldenGate processes fails or Abends are .,

1. Database Level
2. Network Level
3. Storage Level
4. User Level

a. DATABASE LEVEL OF FAILURE

Oracle GoldenGate also fails if you have issues at the Database Level. Below are some of the issues listed.,

Tablespace filled
Redo log corruption
Archive log destination filled
No Primary Key or Unique Index on the tables
Archive log Mode not enabled
Reset Log performed
Memory Problem with Streams_Pool_Size 
Database Hung

b. NETWORK LEVEL OF FAILURE

Network plays a vital role in the Oracle GoldenGate Replication. For each and every commands you execute in the GGSCI prompt, the Manager Process opens a port. There should be a proper, speedy network between the Source and Target sides. Some of the Network level failures are listed below.,

Network Fails
Network slow
Ports Unavailability
Firewall Enabled

c. STORAGE LEVEL OF FAILURE

There should be sufficient storage space available for the Oracle GoldenGate to keep the Trail files. Even at Oracle Database level, there should be sufficient space to retain the Archive Log files and also space for tablespaces. Proper privileges should be given to the file system so that, Oracle GoldenGate Processes creates the trail files in the location.

File System fills
File System corruption
No Proper privileges given to the File System
Connection Problem between Database Server and Storage
No Free Disks Available in Storage

d. USER LEVEL OF FAILURE

Of course, we users make some mistakes. Some of the user level of failures are below.,

Mistakenly Delete the GoldenGate Admin User at Database level.
Manually Performing Operations like Insert, Delete and Update at Target Side.
Manually deleting / removing the Trail Files either from Source server or Target server.
Forcefully Stopping any Oracle GoldenGate Processes like Manager, Extract, Pump, Collector or Replicat.
Killing the Oracle GoldenGate Processes at OS level.
Performing an ETROLLOVER at Extract / Pump / Replicat Processes.

So we have seen the different levels of Failures in Oracle GoldenGate. How to proceed if you face these failures in your day to day life. What is the approach to identify the issue and solve it.

2. HOW TO APPROACH?

The below are the steps on how to approach to the problem. If the environment is a known one, then you can skip some of the steps.

Learn and Understand the Environment
Operating Provider and Operating System Version
Database Provider and Database Version
Is it a Cluster, Active / Passive?
Oracle GoldenGate UniDirectional or Bi-Directional
If Oracle, then is it a Single Instance or RAC – Real Application Clusters
Is it a Homogeneous or Heterogeneous Environment Replication
Network Flow, Ports Used and Firewalls configured
Components used in Oracle GoldenGate like Extract, Pump, Replicat processes and Trails files.

After seeing all the prerequisites like Environment study etc, check if the Processes are up and running. INFO ALL is the command to check the status of the processes. There are different status of process.

RUNNING
The Process has started and running normally.

STOPPED
The Process has stopped either normally (Controlled Manner) or due to an error.

STARTING
The Process is starting.

ABENDED
The Process has been stopped in an uncontrolled manner. Abnormal End is known was ABEND.

From the above status of the Processes status, RUNNING, STOPPED and ABENDED are common. But what is STARTING? What actually happens when the Oracle GoldenGate process is in this state?

Whenever you start an Abended Extract Process, it will take some time to get started. It is because, the process is getting recovered from its last Abend point. To recover it’s processing state, the Extract Process search back to its Online Redo Log file or Archive log file, to find the first log record for the opened transactions when it is crashed. The more back the Extract Process goes in search, the more it takes to recover itself and get started. So, It takes more time depending upon how long back the Open transaction is in the Redo Logs or Archive Logs.

To check the status of the Extract Process and also to check if it is recovering properly, issue the command.,

THREE BASIC FILES

There are many files which needs to be checked whenever you face issue in Oracle GoldenGate. Out of which Oracle GoldenGate logs the activity in three files.

1. Error Log File – ggserr.log
2. Report File
3. Discard File

4. Trace File
5. Alert Log File
6. DDL Trace File

The first three are the very basic, also can be called as major files which are to be looked in to whenever there are problems in the Oracle GoldenGate. Below, is the explanation for these three files.

What is Error Log file – ggserr.log?

This file is created during the Installation of the Oracle GoldenGate. The file is created in the Oracle GoldenGate home directory with the name ggserr.log. For each installation of Oracle GoldenGate, a ggserr.log file is created in the respective Oracle GoldenGate directory. This file is updated by all the processes of Oracle GoldenGate and the below information are logged in this file.,

 Start and Stop of the Oracle GoldenGate Processes.
 Processing Information like Bounded Recovery operations.
 Error Messages.
 Informational messages like normal operations happening in Oracle GoldenGate.
 WARNING Messages like Long Running Transactions.
 Commands executed in GGSCI Prompt.

The format in which the Oracle GoldenGate processes logs the information in to this ggserr.log file is below.,

You can view this file in the ggsci prompt itself by using the command VIEW GGSEVT. But it is always better to view it using the OS tool as this file can grow a lot. The below is the example.,

So with the ggserr.log file you basically identify the below.,

 What is the Error?
 When the Error occurred?
 How Frequently it occurred?
 What were the operations performed before the Error occurred?
 How Frequently the error occurred?

What is Report File?

A Report file is a process specific log file. Each process has its own report file created and this file is created during the instantiation of the process. This file is stored in the directory /dirrpt and the format of this file is .rpt. This file is automatically renamed on the next instantiation of the process. If a process starts all the log entries for that process are written to its respective report file.

Let’s consider a process called EXT and the report file during instantiation of this process is called as EXT.rpt. If this process is stopped and started again, existing file EXT.rpt will be automatically renamed to EXT0.rpt and a new file will be generated with the name EXT.rpt and this occurs recursively till the value of the sequence reaches 9. If the last report file name for the process EXT is created as EXT9, now during the new file generation, the last file EXT9.rpt will be removed and EXT8.rpt will be renamed as EXT9.rpt. So, the report file with the lower sequence value will be the latest and younger one when compared with older sequence valued report file.

REPORTROLLOVER parameter is used to manually or forcefully create a new report file for the processes. To view the current report of the process the below command is used.,

To get the runtime statistics report of a process, use the below command,

The below information can be seen in the report file of a particular process.,

	Oracle GoldenGate Product Version and Release
	Operating Version, Release, Machine Type, Hostname and Ulimit settings of the respective process
	Memory Utilized by the respective process
	Configured Parameters of the respective Oracle GoldenGate Process
	Database Provider, Version and Release
	Trail files Information
	Mapping of Tables
	Informational messages with respective to a particular process
	Warning messages with respective to a particular process
	Error messages with respective to a particular process
	All DDL Operations performed.
	All the Discarded Errors and Ignored Operations
	Crash dumps
	Any commands which are performed on that particular process.

The below is the example of the Report file which I had split it to many parts so that you will get an clear understanding.

1. Oracle GoldenGate Product Version and Release. Operating Version, Release, Machine Type, Hostname and Ulimit settings of the respective process

2. Configured Parameters of the respective Oracle GoldenGate Process

3. Database Provider, Version, Release and Trail File information.

4. Mapping of tables and Informational messages with respect to the Process.

5. Crash dump and Error messages of the respective process.

Above examples clearly shows the contents of a Report file. So with the help of a Report file, the following can be known,

 In which Trail File the Process gets Abend.
 Whether the Trail File is moving forward?
 Whether the process is getting failed with Same Trail File?
 What operations has been performed before the process abend?
 Whether any errors in the Parameter configuration?
 Whether the MAP statements has the correct table names?

What is Discard File?

A log file for logging failed operations of the Oracle GoldenGate processes. It is mainly used for Data errors. In Oracle GoldenGate 11g, this file is not created by default. We have to mention a keyword DISCARDFILE to enable discard file logging. But from Oracle GoldenGate 12c, this file is generated by default during the instantiation of the process.

The Naming format of the Log file is ., but this file can named manually when enabling. Extension of this file is .DSC and this file is located in the directory /dirrpt

PURGE and APPEND keywords are used in the process parameter files to manually maintain the Discard File. Similar to the Report file, the Discard file can also be rolled over using the keyword DISCARDFILEROLLOVER. The syntax is as below.,

file_name
The relative or fully qualified name of the discard file, including the actual file name.

APPEND
Adds new content to existing content if the file already exists.

PURGE
Purges the file before writing new content.

MAXBYTESn | MEGABYTESn
File size in Bytes. For file size in bytes the valid range is from 1 to 2147483646. The default is 50000000. For file size in megabytes the valid range is from 1 to 2147. The default size is 50MB. If the specified size is exceeded, the process Abends.

NODISCARDFILE
When using this parameter, there will be no discard file creation. It prevents generating the Discard file.

The below is the example for the Discard file parameter used in the Replicat process parameter file.,

The Discard File is mainly used in the Target Side. Each and Every Replicat Process should have its own Discard File. This is a mandatory one.

The below is the example which shows the contents of the Discard file. The Replicat process got Abended due to the error OCI Error ORA-01403 : no data found. The discard file is as below.,

So, we have seen about the three basic and important file where Oracle GoldenGate Processes logs the information. There is also a tool which is used to troubleshoot Oracle GoldenGate during Data corruption or trail file corruption. This is mainly used when Data error occurs in the Oracle GoldenGate.

The tool is called LOGDUMP. It is a very useful tool which allows a user to navigate through the trail file and compare the information of the trail file with the data extracted and replicated by the processes. The below can be seen in the trail file using the LOGDUMP utility.,

 Transactions Information
 Operation type and Time when the Record written.
 Source Object name
 Image type, whether it is a Before Image or After Image.
 Column information with data and sequence information.
 Record length, Record data in ASCII format.
 RBA Information.

The below is the example of the contents of the Trail File.,

Some of the Logdump commands with the description are below., To get in to the logdump prompt, just run the logdump program from the Oracle GoldenGate Home directory.

Logdump 1> GHDR ON – To view the Record Header.

Logdump 2> DETAIL ON – To view the column information.

Logdump 3> DETAIL DATA – To view the Hex and ASCII values of the Column.

Logdump 4> USERTOKEN ON – User defined information specified in the Table of Map statements. These information are stored in the Trail file.

Logdump 4> GGSTOKEN ON – Oracle GoldenGate generated tokens. These tokens contains the Transaction ID, Row ID etc.,

Logdump 5> RECLEN length – Manually control the length of the record.

Logdump 6> OPEN file_name – To open a Trail file.

Logdump 7> NEXT – To move to the next File record. In short, you can use the letter N.

Logdump 8> POS rba – To position to a particular RBA.

Logdump 9> POS FIRST – To go to the position of the first record in the file.

Logdump 10> POS 0 – This is the alternate command for the POS FIRST. Either of this can be used.

Logdump 11> SCANFORENDTRANS – To go to the end of the transaction.

Logdump 12> HELP – To get the online help.

Logdump 13> EXIT – To exit from the Logdump prompt. You can also use QUIT alternatively.

Hope you got a clear view on how to approach to a Oracle GoldenGate problem and also find who stopped the Oracle GoldenGate process and the reason behind it.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *