My daily workflow is getting flat files, normally pipe delimited, load them into staging tables and either upsert into main table or use partition exchange. As the input files vary widely, so are the options used - anywhere from calling functions to parse the fields to defaulting values or skipping them when no longer in use. Below are few methods i used frequently.
The SQLLDR utility is commonly used to fast load flat files into a staging table and then use PL/SQL or other programming languages to update the final tables. Recent releases of oracle have external tables which uses sqlldr internally to provide a seamless access to the underlying data.
Let’s call my
sqlldr control file
my_big_loader.ctl. To run this, on command prompt
sqlldr control=my_big_loader.ctl userid=user/pass@db - if
userid is omitted,
sqlldr will prompt for missing info.
Listing commonly used options and variations here. Not all possibilities are covered here.
options (rows=1000, errors=1000) -- run in conventional path options (rows=1000, errors=1000, direct=true) -- run in direct path. options (SKIP_UNUSABLE_INDEXES=TRUE) -- allows loading into table when indexes are in unusable state. Unique indexes/constraints will still throw an error.
Direct path load is good for loading large files but has its drawbacks as well. Direct path can be parallelized as well. This skips the conventional SQL path and creates db-blocks directly.
unrecoverable – When run in direct path, redo log generation is omitted. Check backup and recoverability options when using this.
Input and error files:
load data infile my_large_file.dat -- file to load badfile my_bad_file.bad -- Bad records stored here (optional) discardfile my_discards.dsc -- Discarded records go here (optional)
infile can be
* if data is specified inline (within the control file with BEGINDATA). If no path is specified, input file is expected in current directory. Full paths can be specified for bad and discard files as well.
When there are errors, bad records are written into the
badfile specified till the max error count
errors as specified in options is reached. After reaching the max limit, the load stops.
load data can specify charactersets if different than
load data -- default characterset load data CHARACTERSET UTF8 -- use UTF8 as characterset load data CHARACTERSET TH8TISASCII -- Thai load data CHARACTERSET WE8ISO8859P9 -- Western European 8859
characterset is specified, the data being loaded is assumed to be using that value. After loading, data will be stored in DB character set. This is particularly useful when dealing with files from a variety of data sources from different markets.
One of my load program parsed the main input file and split it into around 7 different data files depending on the country code of the record. Each file is then loaded into the DB with different characterset specified. It was hard to maintain (adding a field requires adding in all 7 files etc).
A final solution after moving the DB to UTF8 was to load them all using UTF8 initially into a staging table and used Oracle’s
convert function twice to convert them properly (first from utf8 to the actual characterset of the file and then to utf8 back). After all transfer systems were able to handle UTF8 or such characterset, these problems went away.
select CONVERT (CONVERT (in_text, 'WE8ISO8859P1', 'UTF8'), 'UTF8', 'WE8ISO8859P1') from DUAL;
Specify table to be loaded
append/insert/replace/truncate -- one of these into table my_staging_table fields terminated by "|" Trailing nullcols
Choose one of the load methods
truncate is fastest for full loads.
append adds data to existing table.
insert option to insert into table - but table should be empty.
replace acts like delete/insert - triggers will fire if defined on delete - also a full table operation not a subset. Use the right method for the job - read the restrictions on their use (primary key/foreign key, unique keys etc).
I commonly use
append for adding small batch files and
truncate to bulk load data into staging area.
Appropriate privileges to load data
into table option should exist.
fields terminated can be specified using any valid character form (ascii, hex, oct etc). If the records are of
POSITION option can be used for each field to specify its length. Additionally, an optionally enclosed by can be specified if data is quoted like
optionally enclosed by '"'.
Occasionally, the source file have more fields than we load - usage of
trailing nullcols allow the load to proceed ignoring the additional fields.
SQLLDR allows to load into multiple tables. For this the
into table option can specify the table name with condition like below:
into table order_header when recordid='H' -- header records into table order_detail when recordid='L' -- order details or line items into table order_payment when recordid='P' -- payment record fields... (recordid filler position(1) char, -- position of the load identifier )
Define the fields in order:
( field1 , field2 NULLIF field2="NULL", skip_field3 filler, -- filler - this field is skipped field4 DATE 'DDMMYYYY' -- use Format specified field5 )
More options example:
( field_time DATE 'YYYYMMDDHH24MISS', field_date DATE 'DDMMYYYY', field_seq_number recnum, -- Rownum equivalent while loading the file field_normal_column , field_def_date sysdate, -- default to sysdate - not in file field_constant_num constant 0, -- default value - not in file field_constant_alpha constant "a", -- default value field_skip_in_file filler, -- skip this field field_call_user_function "schema.pkg.function(:field_call_user_function)", field_substr_function "substr(:field_substr_function,1,8)", field_using_decode "decode(:field_using_decode, NULL, 'ABC', :field_using_decode)", field_last_day_of_mth "last_day(to_date(:field_last_day_of_mth,'DDMMYYYY'))" -- last day )