Export On-Prem Oracle Database and Import Into RDS Using Datapump

24 Aug 2020

# oracle # rds # aws

Now that we created users and tablespaces, its time to start transferring the data to the RDS. This can be achieved in many ways depending on the size of the DB.

Tools like SQL Developer - copy database option.
Create CSV files and use SQLLDR or custom applications to import.
Use DB Links to pull or push data from one DB to another.
Use AWS DMS (Database migration service) to transfer.
Use Datapump to export data - depending on size of the file, can transfer to S3 and then import or transfer to RDS directly for import.

Depending on complexity and size choose the best method. We’ll cover the datapump method here.

The following steps assumes you already created an Oracle RDS instance along with necessary tablespaces and user accounts.

1. Data pump directory and DB Links

Verify DATA_PUMP_DIR exists on both on-prem and RDS.


SELECT * FROM DBA_DIRECTORIES;

Create DB-link from on-prem to RDS instance.



CREATE DATABASE LINK TO_RDSNAME
 CONNECT TO <RDS_USER_ACCOUNT>
 IDENTIFIED BY <password>
 USING '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST="rds-end-point")
 (PORT=1521))(CONNECT_DATA=(SERVICE_NAME="rds_sid")))';

-- Verify db-link works.

SELECT * FROM DUAL@TO_RDSNAME;

2. Create Dump file(s) on on-prem DB

Create a plan. Do we want to transfer all the schemas in one file? or group them by size and dependencies? Do you want to run the exports in parallel or one after another?


DECLARE
hdnl NUMBER;
BEGIN
hdnl := DBMS_DATAPUMP.OPEN( operation => 'IMPORT', job_mode => 'SCHEMA', job_name=>null);
DBMS_DATAPUMP.ADD_FILE( handle => hdnl, filename => 'datafile.dmp',
          directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_dump_file);
DBMS_DATAPUMP.ADD_FILE( handle => hdnl, filename => 'logfile.log',
          directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file);
DBMS_DATAPUMP.METADATA_FILTER(hdnl,'SCHEMA_EXPR','IN (''SCHEMA1'', ''SCHEMA2'')');
DBMS_DATAPUMP.START_JOB(hdnl);
END;
/

Wait for process to complete - verify all components are exported by checking the log file.

3. Transfer Datapump file(s) to RDS

Use DBMS_FILE_TRANSFER to transfer dump file to RDS.


BEGIN
DBMS_FILE_TRANSFER.PUT_FILE(
source_directory_object       => 'SOURCE_DATA_DUMP_DIR',   -- On-prem DATA_PUMP_DIR
source_file_name              => 'datafile.dmp',           -- Source filename
destination_directory_object  => 'DATA_PUMP_DIR',          -- RDS DATA_PUMP_DIR
destination_file_name         => 'datafile.dmp',           -- Target file name
destination_database          => 'TO_RDSNAME'              -- db_link name
);
END;
/

File transfer may take a while. Verify completion of transfer by comparing file size.

On RDS, display files in DATA_PUMP_DIR using RDS utility LISTDIR.


select * from table
    (rdsadmin.rds_file_util.listdir (p_directory => 'DATA_PUMP_DIR'))
    order by 4 desc

4. Import into RDS.

Once file transfer is complete, its time to import into RDS instance.

set serveroutput on size 1000000
DECLARE
hdnl NUMBER;
BEGIN
hdnl := DBMS_DATAPUMP.OPEN( operation => 'IMPORT', job_mode => 'SCHEMA', job_name=>null);

-- Datapump file to import
DBMS_DATAPUMP.ADD_FILE( handle => hdnl,
        filename => 'datafile.dmp',
        directory => 'DATA_PUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_dump_file);

-- Log file to monitor progress        
DBMS_DATAPUMP.ADD_FILE( handle => hdnl,
        filename => 'awsimport.log',
        directory => 'DATA_DUMP_DIR',
        filetype => dbms_datapump.ku$_file_type_log_file);

-- Define schemas to import - same list used in export, unless planning to import a subset.

DBMS_DATAPUMP.METADATA_FILTER(hdnl,'SCHEMA_EXPR',
        'IN (''SCHEMA1'', ''SCHEMA2'')');

-- Start the job.        
DBMS_DATAPUMP.START_JOB(hdnl);
END;
/

Import into a different schema

-- Add METADATA_REMAP to map to new schema
DBMS_DATAPUMP.METADATA_REMAP (hdnl,
                              'REMAP_SCHEMA',
                              'SOUCE_SCHEMA_NAME',
                              'TARGET_SCHEMA_NAME');

Transformation tasks (if needed)

-- Sample OID transformation script
-- By default, datapump import don't create new OID (Object ids). To create new OID during import set OID:0
-- Equivalent to (impdp transform=oid:0)

DBMS_DATAPUMP.METADATA_TRANSFORM (hdnl,
                               'OID',
                               0,
                               null);

Depending on size of the import, this may take anywhere from couple of minutes to hours. For long running jobs, use various queries to track progress.

The log file created during import can be queried with the following statement.

select * from table(RDSADMIN.RDS_FILE_UTIL.READ_TEXT_FILE
         ('DATA_PUMP_DIR','awsimport.log'));

If we try to import the log file to a local system, the following error is usually encountered: ORA-27046: file size is not a multiple of logical block size. This is because of the dbms_file_transfer looking for files in multiple of logical block size, whereas the log file is not. The above query can be used instead to read the log file.

5. Validate import

After successful import, verify object count, row count and any cases you want to check (sum of amount columns, distinct values etc). I did two checks primarily - object count and row count.

Object count


-- Check missing objects in the schemas we imported.

SELECT OWNER , OBJECT_TYPE, OBJECT_NAME FROM ALL_OBJECTS
 WHERE OWNER   IN ( 'SCHEMA1' ,'SCHEMA2','SCHEMA3')
MINUS
SELECT OWNER , OBJECT_TYPE, OBJECT_NAME FROM ALL_OBJECTS@TO_RDSNAME
 WHERE OWNER   IN ( 'SCHEMA1' ,'SCHEMA2','SCHEMA3');

I ran into couple of missing objects - all were user defined function based indexes and had to create them manually. Some of the SYS% objects (constraints, interval partitions etc that oracle creates and manages) may show up due to name differences. For those, i did a count at object type level and verified the import.


-- EXCLUDE SYS% objects and count again.
-- Run this on both databases and compare the output.

SELECT OWNER , OBJECT_TYPE, COUNT(1)
  FROM ALL_OBJECTS
 WHERE OWNER   IN ( 'SCHEMA1' ,'SCHEMA2','SCHEMA3')
   AND OBJECT_NAME NOT LIKE 'SYS%'
 GROUP BY OWNER, OBJECT_TYPE
 ORDER BY 1,2;

Another thing i missed importing is public synonyms - this was easy as i recreated them in one go after all imports.


-- Create list of public synonyms. Spool the output and run it in RDS instance.

SELECT 'CREATE PUBLIC SYNONYM '||
       SYNONYM_NAME||
       ' FOR '||
       TABLE_OWNER||'.'||TABLE_NAME||';'
 FROM dba_synonyms
WHERE table_owner in ( 'SCHEMA1' ,'SCHEMA2','SCHEMA3')
  AND owner = 'PUBLIC';

Count by table - prepare SQL to create “count” statements and run this on both databases.


-- Generate a owner.table, count(1) statement for all tables in specified schema and union them.


SELECT 'SELECT '||''''||OWNER||'.'||
       TABLE_NAME||''''|| ' AS TABLENAME, COUNT(1) FROM '||
       OWNER||'.'||TABLE_NAME ||' UNION '
  FROM ALL_TABLES
 WHERE OWNER IN ( 'SCHEMA1' ,'SCHEMA2','SCHEMA3')
ORDER BY 1;

-- The result will be like:


select 'SCHEMA1.TABLE1' as tablename, count(1) from SCHEMA1.TABLE1 union
select 'SCHEMA1.TABLE2' as tablename, count(1) from SCHEMA1.TABLE2 union
select 'SCHEMA2.TABLE11' as tablename, count(1) from SCHEMA2.TABLE11 union
select 'SCHEMA2.TABLE12' as tablename, count(1) from SCHEMA2.TABLE12 union

-- Remove the last "union" and add any order by clause. "union" can be "union all" as well.

select 'SCHEMA1.TABLE1' as tablename, count(1) from SCHEMA1.TABLE1 union
select 'SCHEMA1.TABLE2' as tablename, count(1) from SCHEMA1.TABLE2 union
select 'SCHEMA2.TABLE11' as tablename, count(1) from SCHEMA2.TABLE11 union
select 'SCHEMA2.TABLE12' as tablename, count(1) from SCHEMA2.TABLE12
order by 1;

-- Run this on both databases - export result to excel or to a csv file for comparison. If counts match, import is successful. If Source database was active during export operation, row counts can differ and had to be addressed manually.

Successful import

Migration is successful after all validations are satisfactory. Shutdown or limit access to old database to prevent access from users.

If datapump data files are no longer needed, remove them to free up space.


-- Get list of files in DATA_DUMP_DIR

select * from table
    (rdsadmin.rds_file_util.listdir(p_directory => 'DATA_PUMP_DIR'))
    order by 4 desc

-- RDS - remove a file from DATA_PUMP_DIR

exec utl_file.fremoved('DATA_PUMP_DIR','datafile.dmp');

6. Monitoring data pump import progress

Most of the time the object count and row count queries helped me monitor the progress of import. For a specific database/schema, the import took way long and felt very slow with no movement in object count. I didn’t want to do row counts in the middle of the job as it may be unreliable. The import log was also moving slow or appear to be stuck.

Some of the queries below helped to monitor the progress (queries are not mine - collected from AWS resources and via google search). Listing them here as it may help others as well.


-- List of datapmup jobs
select * from dba_datapump_jobs
order by start_time desc;


select * from v$session_longops  --where sid  = <import job sid>
order by last_update_time desc;

-- Queries taking longer time - use session longops to monitor progress

SELECT b.username, a.sid, b.opname, b.target, a.module,
            round(b.SOFAR*100/b.TOTALWORK,0) || '%' as "%DONE", b.TIME_REMAINING,
            to_char(b.start_time,'YYYY/MM/DD HH24:MI:SS') start_time,
            to_char(b.last_update_time,'YYYY/MM/DD HH24:MI:SS') last_update_time,
            to_char(b.sql_exec_start,'YYYY/MM/DD HH24:MI:SS') sql_exec_start
     FROM v$session_longops
     b, v$session a
     WHERE a.sid = b.sid      ORDER BY 9 desc;

-- Querying V$SESSION_LONGOPS and V$DATAPUMP_JOB views:-     
SELECT sl.sid, sl.serial#, sl.sofar, sl.totalwork, dp.owner_name, dp.state, dp.job_mode
     FROM v$session_longops sl, v$datapump_job dp
     WHERE sl.opname = dp.job_name
     AND sl.sofar != sl.totalwork;


-- Querying related views

select x.job_name,b.state,b.job_mode,b.degree
, x.owner_name,z.sql_text, p.message
, p.totalwork, p.sofar
, round((p.sofar/p.totalwork)*100,2) done
, p.time_remaining
from dba_datapump_jobs b
left join dba_datapump_sessions x on (x.job_name = b.job_name)
left join v$session y on (y.saddr = x.saddr)
left join v$sql z on (y.sql_id = z.sql_id)
left join v$session_longops p ON (p.sql_id = y.sql_id)
WHERE y.module='Data Pump Worker'
AND p.time_remaining > 0;


-- Also for any errors you can check the alert log and query the DBA_RESUMABLE view.

select * from dba_resumable;


-- INSERT statements from V$SQLAREA
col table_name format a30

select substr(sql_text, instr(sql_text,'"')+1,
               instr(sql_text,'"', 1, 2)-instr(sql_text,'"')-1)
          table_name,
       rows_processed,
       round((sysdate
              - to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))
             *24*60, 1) minutes,
       trunc(rows_processed /
                ((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))
             *24*60)) rows_per_min
from
   v$sqlarea
where
  upper(sql_text) like 'INSERT % INTO "%'
  and
  command_type = 2
  and
  open_versions > 0;

-- Check for locks from other sessions.

set lines 200
set pages 50
col object1 for a30
col object2 for a30

select l.*, o1.owner || '.' || o1.object_name object1, o2.owner || '.' || o2.object_name object2
  from v$lock l
     , dba_objects o1
     , dba_objects o2
 where o1.object_id = l.id1
   and o2.object_id(+) = l.id2;

-- More V$SQL or GV$SQL queris to check elapsed time.

select elapsed_time/1000000 seconds, s.inst_id, s.sql_text, s.users_executing, s.cpu_time,
       s.first_load_time, s.last_load_time
  from gv$sql s
  where users_executing > 0
 union
 select elapsed_time/1000000 seconds, s.inst_id, s.sql_text, s.users_executing, s.cpu_time,
       s.first_load_time, s.last_load_time
  from gv$sql s
   where rownum < 50
 order by last_load_time;

To kill a session in RDS:


begin
    rdsadmin.rdsadmin_util.kill(
        sid    => <sid>,
        serial => <serial number#>);
end;
/

-- Oracle Equivalent:

ALTER SYSTEM KILL SESSION '<sid, serial#>'

Migrating On-prem Oracle DB to AWS RDS using datapump

Managing multiple shell and sql job runs