In this post we will look at three different methods of performing an initial data load from an Oracle 11g source database running on an HP-UX IA64 platform to a SQL Server 2012 target database hosted on Windows 2012 Datacenter.
The three methods we are using here are:
1) Oracle GoldenGate Direct Load over network without trail files
2) Oracle GoldenGate File to Replicat method
3) Oracle GoldenGate File with SQL Server BULK INSERT
These are some of the results obtained in our testing:
Initial load extract:
Between 2 and 3 million rows per minute
PRD.SH_BATCH_LOG table with 8001500 rows extracted in 4:30 minutes
PRD.AC_TRANSACTION_RACI table with 74104323 rows extracted in 21 minutes
Initial load replicat:
Between 1 to 1.5 million rows every 2 minutes
With single replicat process, table PRD.SH_BATCH_LOG with 7895001 rows took 15 minutes.
With 3 parallel replicat processes, the same 7.8 million row table was loaded in under 5 minutes. Each replicat processed about 2.6 million rows each.
3 parallel replicat processes pushed CPU utilization to around 60-70% mark but not higher.
Using 5 parallel replicat processes we were able to load a 177 million row table in little over 3 hours
The best performance obtained was using SQL Server BULK INSERT, where we were able to load 8 million rows in around 2 minutes.
1) Oracle GoldenGate Direct Load over network without trail files
Note – in the Direct Load method, no trail files are created – but this is not very efficient method for a large table.
GGSCI (db02) 2> edit params defgen DEFSFILE ./dirdat/source.def, USERID GGATE_OWNER@REDEVDB2, PASSWORD ggate TABLE PRD.T_PRODUCT_LINE; oracle@db02:/u01/oracle/goldengate > ./defgen paramfile /u01/oracle/goldengate/dirprm/defgen.prm
Copy source.def to ./dirdef directory on Windows 2012 server
Create the Initial Load Extract
GGSCI (db02) 5> add extract extinit1 sourceistable EXTRACT added. extract extinit1 userid ggate_owner password ggate RMTHOST DCV-RORSQL-N001.local, MGRPORT 7809 RMTTASK REPLICAT, GROUP rinit1 TABLE PRD.T_PRODUCT_LINE;
Next create the table in the SQL Server database
Create the initial load replicat on the target SQL Server GoldenGate environment
GGSCI (DCV-RORSQL-N001) 28> add replicat repinit1 specialrun REPLICAT added. GGSCI (DCV-RORSQL-N001) 29> edit params rinit1 replicat rinit1 TARGETDB sqlserver2012 SOURCEDEFS ./dirdef/source.def MAP PRD.T_PRODUCT_LINE, TARGET PRD.T_PRODUCT_LINE;
Start the initial load extract
GGSCI (db02) 11> start extract extinit1 Sending START request to MANAGER ... EXTRACT EXTINIT1 starting GGSCI (db02) 12> info extract extinit1 EXTRACT EXTINIT1 Initialized 2014-02-25 10:26 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Not Available First Record Record 0 Task SOURCEISTABLE GGSCI (db02) 13> info extract extinit1 EXTRACT EXTINIT1 Last Started 2014-02-25 10:59 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Table PRD.T_PRODUCT_LINE 2014-02-25 10:59:03 Record 1 Task SOURCEISTABLE GGSCI (db02) 14> info extract extinit1 EXTRACT EXTINIT1 Last Started 2014-02-25 10:59 Status STOPPED Checkpoint Lag Not Available Log Read Checkpoint Table PRD.T_PRODUCT_LINE 2014-02-25 10:59:06 Record 801 Task SOURCEISTABLE
When the Extract shows the status STOPPED, we can check the target PRD.T_PRODUCT_LINE via the SQL Server 2012 Management Studio and we find that 801 rows have been inserted in the table.
Note – in this method we do not need to start the replicat process on the target.
2) Oracle GoldenGate File to Replicat method
In this method we will create 3 replicat processes which will be running in parallel processing the trail files which are generated by the extract process.
The table has 74 million rows.
Create initial load extract
GGSCI (db02) 15> add extract extinit2 sourceistable EXTRACT added. GGSCI (db02) 3> edit params extinit2 extract extinit2 userid ggate_owner password ggate RMTHOST DCV-RORSQL-N001.local, MGRPORT 7809, tcpbufsize 10485760, tcpflushbytes 10485760 rmtfile ./dirdat/te, maxfiles 999999, megabytes 400, purge reportcount every 300 seconds, rate TABLE PRD.AC_TRANSACTION_RACI;
As in the first method, we create the definitions file using DEFGEN and then copy the generated file to the dirdef directory on the target SQL Server GoldenGate software home.
oracle@db02:/u01/oracle/goldengate > ./defgen paramfile /u01/oracle/goldengate/dirprm/defgen.prm
Create three parallel replicat groups
GGSCI (DCV-RORSQL-N001) 39> add replicat repinit2 exttrail ./dirdat/te REPLICAT added. GGSCI (DCV-RORSQL-N001) 40> add replicat repinit3 exttrail ./dirdat/te REPLICAT added. GGSCI (DCV-RORSQL-N001) 41> add replicat repinit4 exttrail ./dirdat/te REPLICAT added. GGSCI (DCV-RORSQL-N001) 42> edit params repinit2 GGSCI (DCV-RORSQL-N001) 43> edit params repinit3 GGSCI (DCV-RORSQL-N001) 44> edit params repinit4 GGSCI (DCV-RORSQL-N001) 45> view params repinit2 replicat repinit2 targetdb sqlserver2012 SOURCEDEFS ./dirdef/source.def reportcount every 60 seconds, rate overridedups end runtime MAP PRD.AC_TRANSACTION_RACI, TARGET PRD.AC_TRANSACTION_RACI , filter ( @RANGE (1,3)); GGSCI (DCV-RORSQL-N001) 46> view params repinit3 replicat repinit3 targetdb sqlserver2012 SOURCEDEFS ./dirdef/source.def reportcount every 60 seconds, rate overridedups end runtime MAP PRD.AC_TRANSACTION_RACI, TARGET PRD.AC_TRANSACTION_RACI , filter ( @RANGE (2,3)); GGSCI (DCV-RORSQL-N001) 47> view params repinit4 replicat repinit4 targetdb sqlserver2012 SOURCEDEFS ./dirdef/source.def reportcount every 60 seconds, rate overridedups end runtime MAP PRD.AC_TRANSACTION_RACI, TARGET PRD.AC_TRANSACTION_RACI , filter ( @RANGE (3,3));
Start the initial load extract
GGSCI (db02) 4> start extract extinit2 Sending START request to MANAGER ... EXTRACT EXTINIT2 starting GGSCI (db02) 5> info extract extinit2 EXTRACT EXTINIT2 Initialized 2014-02-25 11:27 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Not Available First Record Record 0 Task SOURCEISTABLE GGSCI (db02) 6> ! info extract extinit2 EXTRACT EXTINIT2 Last Started 2014-02-25 11:56 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Table PRD.AC_TRANSACTION_RACI 2014-02-25 11:56:51 Record 1 Task SOURCEISTABLE GGSCI (db02) 9> ! info extract extinit2 EXTRACT EXTINIT2 Last Started 2014-02-25 11:56 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Table PRD.AC_TRANSACTION_RACI 2014-02-25 11:57:31 Record 2312001 Task SOURCEISTABLE
While the initial load extract is running start the three parallel replicat processe
GGSCI (DCV-RORSQL-N001) 48> start replicat repinit2 Sending START request to MANAGER ('MANAGER') ... REPLICAT REPINIT2 starting GGSCI (DCV-RORSQL-N001) 49> start replicat repinit3 Sending START request to MANAGER ('MANAGER') ... REPLICAT REPINIT3 starting GGSCI (DCV-RORSQL-N001) 50> start replicat repinit4 Sending START request to MANAGER ('MANAGER') ... REPLICAT REPINIT4 starting GGSCI (DCV-RORSQL-N001) 51> info replicat repinit2 REPLICAT REPINIT2 Last Started 2014-02-25 11:59 Status RUNNING Checkpoint Lag 00:02:45 (updated 00:00:02 ago) Process ID 5792 Log Read Checkpoint File ./dirdat/te000000 2014-02-25 11:57:10.276246 RBA 4893789
While the 3 replicat processes are running, we can see that they are each processing almost the same number of rows and the initial load task has been distributed between the 3 parallel replicat processes
GGSCI (DCV-RORSQL-N001) 54> stats replicat repinit2 latest Sending STATS request to REPLICAT REPINIT2 ... Start of Statistics at 2014-02-25 12:02:46. Replicating from PRD.AC_TRANSACTION_RACI to PRD.AC_TRANSACTION_RACI: *** Latest statistics since 2014-02-25 11:59:45 *** Total inserts 100663.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100663.00 End of Statistics. GGSCI (DCV-RORSQL-N001) 55> stats replicat repinit3 latest Sending STATS request to REPLICAT REPINIT3 ... Start of Statistics at 2014-02-25 12:02:56. Replicating from PRD.AC_TRANSACTION_RACI to PRD.AC_TRANSACTION_RACI: *** Latest statistics since 2014-02-25 11:59:45 *** Total inserts 100071.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 100071.00 End of Statistics. GGSCI (DCV-RORSQL-N001) 56> stats replicat repinit4 latest Sending STATS request to REPLICAT REPINIT4 ... Start of Statistics at 2014-02-25 12:03:02. Replicating from PRD.AC_TRANSACTION_RACI to PRD.AC_TRANSACTION_RACI: *** Latest statistics since 2014-02-25 11:59:47 *** Total inserts 98042.00 Total updates 0.00 Total deletes 0.00 Total discards 0.00 Total operations 98042.00 End of Statistics.
We now see that the initial load extract has stopped and it has extracted 74 million rows
GGSCI (db02) 14> ! info extract extinit2 EXTRACT EXTINIT2 Last Started 2014-02-25 11:56 Status STOPPED Checkpoint Lag Not Available Log Read Checkpoint Table PRD.AC_TRANSACTION_RACI 2014-02-25 12:17:53 Record 74104323 Task SOURCEISTABLE
3) Oracle GoldenGate File with SQL Server BULK INSERT
In this method we use the SQL Server 2012 BULK INSERT to process the text file which is generated by the GoldenGate extract process.
Create the initial load extract
Note the parameter used in the extract file – FORMATASCII, BCP
This parameter instructs Oracle GoldenGate to write the output to a text file which is compatible with the SQL Server BCP utility.
GGSCI (db02) 19> add extract extbcp sourceistable EXTRACT added. GGSCI (db02) 20> edit params extbcp "/u01/oracle/goldengate/dirprm/extbcp.prm" 6 lines, 181 characters extract extbcp userid ggate_owner, password ggate FORMATASCII, BCP RMTHOST DCV-RORSQL-N001.local, MGRPORT 7809 rmtfile ./dirdat/myobjects.dat PURGE TABLE GGATE_OWNER.MYOBJECTS;
Start the initial load extract
GGSCI (db02) 1> start extract extbcp Sending START request to MANAGER ... EXTRACT EXTBCP starting GGSCI (db02) 2> info extract extbcp EXTRACT EXTBCP Last Started 2014-02-25 12:37 Status RUNNING Checkpoint Lag Not Available Log Read Checkpoint Table GGATE_OWNER.MYOBJECTS 2014-02-25 12:37:05 Record 1 Task SOURCEISTABLE GGSCI (db02) 3> ! info extract extbcp EXTRACT EXTBCP Last Started 2014-02-25 12:37 Status STOPPED Checkpoint Lag Not Available Log Read Checkpoint Table GGATE_OWNER.MYOBJECTS 2014-02-25 12:37:06 Record 78165 Task SOURCEISTABLE
Create the initial load replicat
GGSCI (DCV-RORSQL-N001) 62> edit params repbcp GGSCI (DCV-RORSQL-N001) 63> view params repbcp targetdb sqlserver2012 GENLOADFILES bcpfmt.tpl SOURCEDEFS ./dirdef/source.def extfile ./dirdat/myobjects.dat assumetargetdefs MAP GGATE_OWNER.MYOBJECTS, TARGET GGATE_OWNER.MYOBJECTS;
Start the replicat from the command line
D:\app\product\GoldenGate>replicat paramfile ./dirprm/repbcp.prm reportfile ./dirrpt/repbcp.rpt *********************************************************************** Oracle GoldenGate Delivery for SQL Server Version 12.1.2.0.1 17597485 OGGCORE_12.1.2.0.T2_PLATFORMS_131206.0309 Windows x64 (optimized), Microsoft SQL Server on Dec 6 2013 12:44:54 Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved. Starting at 2014-02-25 12:41:14 *********************************************************************** Operating System Version: Microsoft Windows , on x64 Version 6.2 (Build 9200: ) Process id: 1840 Description: *********************************************************************** ** Running with the following parameters ** *********************************************************************** 2014-02-25 12:41:14 INFO OGG-03059 Operating system character set identified as windows-1252. 2014-02-25 12:41:14 INFO OGG-02695 ANSI SQL parameter syntax is used for parameter parsing. 2014-02-25 12:41:15 INFO OGG-01552 Connection String: provider=SQLNCLI11;initial catalog=PRD;data source=DCV-RORSQL-N001;persist security info=false;integrated security=sspi. 2014-02-25 12:41:15 INFO OGG-03036 Database character set identified as windows-1252. Locale: en_US. 2014-02-25 12:41:15 INFO OGG-03037 Session character set identified as windows-1252. 2014-02-25 12:41:15 INFO OGG-03528 The source database character set, as dtermined from the table definition file, is UTF-8. Using following columns in default map by name: object_id, object_name, object_type File created for BCP initiation: MYOBJECTS.bat File created for BCP format: MYOBJECTS.fmt Load files generated successfully.
In SQL Server 2012 Management Studio load the data into SQL Server table via the BULK INSERT command.
bulk insert [PRD].[GGATE_OWNER].[MYOBJECTS] from 'D:\app\product\GoldenGate\dirdat\myobjects.dat' with( DATAFILETYPE = 'char', FIELDTERMINATOR = '\t', ROWTERMINATOR = '0x0a' ); (78165 row(s) affected)