| Data conversion for Oracle databases |
This page describes some basic concepts for Oracle database data conversion.
What is a meta-script A meta script (or generic script) is a script that generates scripts, usually by extracting data or information from the system catalogue. A very simple example meta script statement would be:
SELECT 'SELECT * FROM DUAL;' FROM DUAL;
The above script will generate a new script:
SELECT * FROM DUAL;
Meta scripts are useful for porting data, creating install scripts and for simplifying programming in general by avoiding tedious reentering of information already known to the system, e g writing column names in tables. An example for enabling all primary key constraints in a schema may be:
SET HEADING OFF LINESIZE 900 TRIMSPOOL ON PAGESIZE 500
SELECT 'ALTER TABLE '||TABLE_NAME||
' ENABLE CONSTRAINT '||CONSTRAINT_NAME||';'
FROM USER_CONSTRAINTS WHERE CONSTRAINT_TYPE = 'P'
ORDER BY 1;
Meta scripts of higher orders are often more general and therefore more useful. An (quite unuseful) example of a higher order meta-script is:
Meta-scripts of higher orders SELECT 'SELECT ''SELECT * FROM DUAL;'' FROM DUAL;' FROM DUAL;
When executed this script will produce a script that, when executed will produce a script. A more prectical example of a meta-script of higher order is the script USERTABS.TXT that will create an extraction script for any table.
Many projects involve data conversion, i e mapping of old data to new data. Needless to say, it is of vital importance that the data owner (the customer) supplies a data conversion specification (column mappings) that describes how the data is to be converted.
Data conversion Mapping old column data into new data will serve the dual purpose of forcing the data owner to make proper decisions on data transformation, as well as setting focus on data quality. Very often there is "exceptions" in the data set - records that are specially treated, faulty records, records used only for testing and unintentionally left in the system and so on. Thus, the specification also includes specifying criterions for what records that simply shall be discarded (!).
The column mapping specification Data conversion may be performed using DDL statements, Stored procedures, DML and triggers or by using scripts
Data conversion methods DDL statements
Data conversion using DDL is carried out by gradually transforming the data step-by-step into a format suited for the new data representation. When dealing with large tables it is often wise to make the steps as small as possible. By joining data in tables in the old representation and creating new temporary tables (using CREATE TABLE TMP_1 AS SELECT . . . FROM . . .) and thereafter reducing the information in the temporary tables, the final representation is reached. These transformations may be looked upan as mostly manipulation of data sets. Do not forget to create indexes (CREATE INDEX . . .) on the temporary tables to speed up things. The operators UNION, MINUS, WHERE NOT IN, EXISTS is often used to reduce or expand data sets.
Stored procedures
Stored procedures may be used when the relation between the old data and the new data is complicated, or when there is a risk of exceeding the rollback segments during conversion. By creating a stored procedure, or a BEGIN-END block, advanced manipulations may be applied to each record in a selection. It is also possible to issue a COMMIT at regular intervals (e g every 200 records - an example of this is shown in the section describing conversion logs).
DML and triggers
Triggers may in some cases be used for advanced manipulations of complex data, in combination with DML (or DDL).
Scripts
Sometimes the scripts prowide a swift way of convering data. An example statement is provided below, where an select statement will generate a script that deletes all records in table T1 having the value 'ABC' in column COL1. By generating the script, and then inspecting the data for column COL2, it is easy to inspect what records will be deleted and possibly modify the generated script by hand according to the data conversion specification. The resulting script may then be pasted to the end of the conversion log.
select ' delete from T1
where ROWID = '''||ROWID||
''' AND COL2 = '''||COL2||''';' from T1 where COL1 = 'ABC';
The problem with multiple records are often found when one tries to enforce an integrity constraint on a table. If there exists duplicates may be verified using a statement like:
Removal of duplicates
select count(*) from the_table
group by column1, column2
having count(*) > 1;
The "duplicate records" may or may not be entirey equal in all columns. The important fact is however that the record values are equal in the columns intended for use in the constraint, e g a primary key. In finding these duplicates a selection must be made by GROUPING over all the columns in the new integrity constraint. The psudo-column ROWID is then used to separate what looks as equal records. Below is shown how this may be made using a script approach (giving more control over exactly what records are removed, because the script may be inspected before actually removing the records) and a DML approach.
Duplicate removal using scripts
Create a script using the following statement.
Select 'delete from the_table where rowid = '''||rowid||''';' From the_table where rowid not in (
select min(rowid)from the_table
group by column1, column2);
Duplicate removal using DML
The records may of cause be deleted directly using the DML-statement shown below:
delete from the_table where rowid not in ( select min(rowid)from the_table
group by column1, column2);
It is often appropriate to create a conversion log of a database conversion. In this way it is easy to check what transformations has been applied to the data. The conversion log may simply be a text document where all DDL and DML command applied is pasted in the proper sequence. The conversion log may be used to repeat the conversion (provided there is a database backup available). This is an example of a part of a conversion log (the BEGIN-END block at the end forces a commit every 200 records in order not to overload the available rollback segment(s)):
The conversion log
create table rfx_PRIVATWAGENSTAM as select WPS_CODE, WPS_MIETERBAU, wps_import
from PRIVATWAGENSTAM where WPS_MIETERBAU is not null;
alter table rfx_PRIVATWAGENSTAM add wps_thecode varchar2(12);
update rfx_PRIVATWAGENSTAM set wps_thecode = to_char(wps_code);
delete from rfx_PRIVATWAGENSTAM where wps_thecode is null;
commit;
alter table rfx_PRIVATWAGENSTAM add constraint idx_rfx_PRIVATWAGENSTAM
primary key (wps_thecode, wps_import);
-- *** NOW USE THE UPDATE TRIGGER FOR THE 6393 RECORDS ***
DECLARE
Cursor the_C is select wps_thecode, WPS_MIETERBAU, wps_import from rfx_PRIVATWAGENSTAM
order by wps_import;
cnt number(10) := 0;
BEGIN
for c in the_c loop
update bawagen set WAG_NUTZERBAU = c.WPS_MIETERBAU
where wag_code = c.wps_thecode;
cnt := cnt + 1;
if cnt > 200 then
cnt := 0;
insert into debug values ('OK - 200');
commit;
end if;
END LOOP;
Commit;
END;
/
A DDL statement (Data Definition Language) is a statement that manipulates objects (like tables, triggers, procedures) in a in a database. A database may be said to be a generic data structure, and a DDL statement will affect the "structure" of things in the database (e g creation or deletion of tables, indexes).
A DML statement (Data Manipulation Language) will affect data in the database, but not its "structure". This includes insertion or deletion of table records, updates of table columns and so on.