The MDM-RE Table Loader generates DBMS load files in delimited text format by default. Specify the
Loader- Definition
option
FIXED
to generate fixed length binary files instead.
To verify that the input data has been read and processed properly, you may specify the
Loader- Definition
option
Keep-Temp
. This will prevent the DBMS loader files from being deleted after the load completes so that they may be examined.
Flat-File Input
If input data is read from a flat-file, make sure that
the file contains fixed length records
FORMAT=Binary
is specified in the
Logical-File-Definition
the format of the input records matches the Input-View
the record length is the sum of the field lengths in the View
MSQ
Data loaded from a flat-file into
CHAR
or
VARCHAR
columns will be translated from the client’s code page into UNICODE, transferred to the server and then translated into the server’s code page and stored.
The client’s code page should be identical to the server’s code page, otherwise the conversion could be lossy.
If the character set of the data file does not match the client’s code page (Locale) specify the
DATABASEOPTIONS= IDTCP
parameter to specify the code page of the data.
User Source Table Input
During data extraction from a User Source Table’s
CHAR/VARCHAR
columns, the data is translated from the Server’s code page to the client’s code page (Table Loader). The client’s code page should be identical to the server’s code page; otherwise the conversion could be lossy.
Oracle
A safe approach is to use a UNICODE character set for the database and to specify a UNICODE character set for the client (MDM-RE Servers and utilities). MDM-RE automatically requests UNICODE data to be returned as UTF-16 .This ensures that no lossy conversions are performed when reading/writing data to/from the database.
The mass loader file generated by the Table Loader will automatically use a
Fixed
length format when UNICODE columns are present.
MSQ
NCHAR
and
NVARCHAR
columns are not converted. They remain in UNICODE format.
Target Column Size
The format and length of an IDT column defaults to the same values as the source column. In most cases, this is adequate. However, if the source and target databases do not use the same character set, it may be necessary to increase the size of the target column to accommodate the change of encoding.
For example, suppose a source column is defined as CHAR(5) encoded in a Central European character set such as Windows Code Page 1250. Suppose a particular row in the source table contains 5 bytes of data, with four of them being Latin characters (hexadecimal values <0x7F) and one of the characters being the Latin character A with an Acute (= 0xC1 = U+00C1).
When encoded in this character set, the data only requires 5 bytes. However, if it is now stored in an IDT on a database that uses UTF-8 as its character set, the data will be converted. The Latin characters will still only require one byte when expressed as UTF-8 but the
A + Acute
will be encoded using 2 bytes, with the total storage requirement being 6 bytes.
If the default column size is insufficient, use a length override in the definition of the target column to increase its size.