SQL*Loader (SQLLDR):
My daily workflow is getting flat files, normally pipe delimited, load them into staging tables and either upsert into main table or use partition exchange. As the input files vary widely, so are the options used - anywhere from calling functions to parse the fields to defaulting values or skipping them when no longer in use. Below are few methods i used frequently.
The SQLLDR utility is commonly used to fast load flat files into a staging table and then use PL/SQL or other programming languages to update the final tables. Recent releases of oracle have external tables which uses sqlldr internally to provide a seamless access to the underlying data.
Let’s call my sqlldr
control file my_big_loader.ctl
. To run this, on command prompt sqlldr control=my_big_loader.ctl userid=user/pass@db
- if userid
is omitted, sqlldr
will prompt for missing info.
Listing commonly used options and variations here. Not all possibilities are covered here.
file: my_big_loader.ctl
Specify options:
options (rows=1000, errors=1000) -- run in conventional path
options (rows=1000, errors=1000, direct=true) -- run in direct path.
options (SKIP_UNUSABLE_INDEXES=TRUE) -- allows loading into table when indexes are in unusable state. Unique indexes/constraints will still throw an error.
Direct path load is good for loading large files but has its drawbacks as well. Direct path can be parallelized as well. This skips the conventional SQL path and creates db-blocks directly.
Redo logs:
unrecoverable
– When run in direct path, redo log generation is omitted. Check backup and recoverability options when using this.
Input and error files:
load data
infile my_large_file.dat -- file to load
badfile my_bad_file.bad -- Bad records stored here (optional)
discardfile my_discards.dsc -- Discarded records go here (optional)
infile
can be *
if data is specified inline (within the control file with BEGINDATA). If no path is specified, input file is expected in current directory. Full paths can be specified for bad and discard files as well.
When there are errors, bad records are written into the badfile
specified till the max error count errors
as specified in options is reached. After reaching the max limit, the load stops.
load data
can specify charactersets if different than NLS_CHARACTERSET
.
load data -- default characterset
load data CHARACTERSET UTF8 -- use UTF8 as characterset
load data CHARACTERSET TH8TISASCII -- Thai
load data CHARACTERSET WE8ISO8859P9 -- Western European 8859
When characterset
is specified, the data being loaded is assumed to be using that value. After loading, data will be stored in DB character set. This is particularly useful when dealing with files from a variety of data sources from different markets.
One of my load program parsed the main input file and split it into around 7 different data files depending on the country code of the record. Each file is then loaded into the DB with different characterset specified. It was hard to maintain (adding a field requires adding in all 7 files etc).
A final solution after moving the DB to UTF8 was to load them all using UTF8 initially into a staging table and used Oracle’s convert
function twice to convert them properly (first from utf8 to the actual characterset of the file and then to utf8 back). After all transfer systems were able to handle UTF8 or such characterset, these problems went away.
select
CONVERT (CONVERT (in_text, 'WE8ISO8859P1', 'UTF8'),
'UTF8',
'WE8ISO8859P1') from DUAL;
Specify table to be loaded
append/insert/replace/truncate -- one of these
into table my_staging_table
fields terminated by "|" Trailing nullcols
Choose one of the load methods truncate
is fastest for full loads. append
adds data to existing table.
insert
option to insert into table - but table should be empty. replace
acts like delete/insert - triggers will fire if defined on delete - also a full table operation not a subset. Use the right method for the job - read the restrictions on their use (primary key/foreign key, unique keys etc).
I commonly use append
for adding small batch files and truncate
to bulk load data into staging area.
Appropriate privileges to load data into table
option should exist. fields terminated
can be specified using any valid character form (ascii, hex, oct etc). If the records are of fixed
length, POSITION
option can be used for each field to specify its length. Additionally, an optionally enclosed by can be specified if data is quoted like optionally enclosed by '"'
.
Occasionally, the source file have more fields than we load - usage of trailing nullcols
allow the load to proceed ignoring the additional fields.
SQLLDR
allows to load into multiple tables. For this the into table
option can specify the table name with condition like below:
into table order_header when recordid='H' -- header records
into table order_detail when recordid='L' -- order details or line items
into table order_payment when recordid='P' -- payment record
fields...
(recordid filler position(1) char, -- position of the load identifier
)
Define the fields in order:
(
field1 ,
field2 NULLIF field2="NULL",
skip_field3 filler, -- filler - this field is skipped
field4 DATE 'DDMMYYYY' -- use Format specified
field5
)
More options example:
(
field_time DATE 'YYYYMMDDHH24MISS',
field_date DATE 'DDMMYYYY',
field_seq_number recnum, -- Rownum equivalent while loading the file
field_normal_column ,
field_def_date sysdate, -- default to sysdate - not in file
field_constant_num constant 0, -- default value - not in file
field_constant_alpha constant "a", -- default value
field_skip_in_file filler, -- skip this field
field_call_user_function "schema.pkg.function(:field_call_user_function)",
field_substr_function "substr(:field_substr_function,1,8)",
field_using_decode "decode(:field_using_decode, NULL, 'ABC', :field_using_decode)",
field_last_day_of_mth "last_day(to_date(:field_last_day_of_mth,'DDMMYYYY'))" -- last day
)