Table of Contents

Search

  1. Preface
  2. Introduction
  3. Installation
  4. Design
  5. Operation

Single UST

Single UST

The simplest form of IDT is created from column values extracted from a single User Source Table. The general form of the syntax is:
logical-file-definition *====================== NAME= lf-input PHYSICAL-FILE= "VirtualTable" COMMENT= "input from SQL database" FORMAT= SQL ... Section: User-Source-Tables CREATE_IDT VirtualTable,FileNo SOURCED_FROM [connection_string] source_clause [TRANSFORM transform_clause] [SELECT_BY select_clause] [NOSYNC] ;
where
VirtualTable
This is the name specified by the
PHYSICAL-FILE
parameter of the Logical-File-Definition.
FileNo
This is a unique file number associated with this IDT. It must be in the range 1-14999, although some
FileNo
s are reserved for internal use (2000-2010).
Connection String
The
connection_string
is used to specify connection information for access to the UST on the source database.
The format of the
connection_string
is as follows:
odb:99:Userid/Password@Service
where
Userid
DBMS user-id
Password
DBMS password
Service
This is the
Service_Name
defined in your
odbc.ini
file. Refer to the
Configuring ODBC
section for details.
For example,
odb:99:scott/tiger@ora920
specifies an Oracle host DBMS. DCE will connect to the DBMS identified as
"ora920"
using the user id "scott" and a password of "tiger". Any valid userid with SELECT privileges on the source tables may be used.
Source Clause
This section is used to specify the source of the data. The
source_clause
is used to nominate the UST columns that are to be extracted when creating the IDT. Source fields can be either added directly into the IDT or used in a transformation process, the result of which is added to the IDT.
The
source_clause
syntax is:
[src_schema.]table.column [tgt_fld [fmt(len)]] [,...]
where
src_schema
This is the name of the schema that the
table.column
belongs to. The default value is your userid.
table
The source table name that contains the column.
Oracle:
Synonyms (private, public, recursive) may be used but are converted to the real
schema.table
name during parsing.
column
The name of the source column to be extracted.
tgt_fld
The name of the IDT field. If omitted, it defaults to the value of
column
. If
tgt_fld
begins with a $, it is treated as a
virtual field
. Virtual fields are not stored in the IDT. They can only be referenced in a
transform_clause
.
fmt(len)
The format and length of the
tgt_fld
in the IDT. When omitted, they default to the format and length of the source column. Valid formats are defined in the
File Definition
section of this manual.
Transform Clause
A
transform_clause
is really an optional part of a
source_clause
or
merge_clause
. It is used to specify how virtual fields are to be combined/transformed and characteristics of the resulting field stored in the IDT.
The
transform_clause
syntax is:
transform_rule tgt_fld fmt(len) [order n] [,...]
where
transform_rule
Nominates the transformation process which will convert virtual fields into a value to be stored in
tgt_fld
.
tgt_fld
This is the name of the field to be stored in the IDT.
fmt(len)
This is the format and length of the
tgt_fld
.
order n
This is used to override the default order of fields in the IDT. Normally fields are placed in the IDT in the order of definition in the
source_clause
and
transform_clause
. You may override this order by nominating an integer value for n, starting from 1.
Transform Rules
Transform rules fall into three categories:
  1. rules that use no virtual fields (
    Source-Id
    ,
    Insert-Constant
    )
  2. rules that operate on a single virtual field (
    Upper
    ,
    Lower
    )
  3. rules that operate on many fields (
    Append
    ,
    Concat
    ,
    Insert-Field
    )
Each virtual field can only be referenced once in the
transform_clause
. In the unlikely event that the same source column is to be used in more than one transform clause, add an extra
source_clause
for that column but give it a different virtual field name.
A
transform_rule
is one of the following:
Source-Id Insert-Constant ("text") Upper (vf) Lower (vf) Insert-Field (vf, offset, ...) Append ( vf | lower (vf) | upper (vf) | "text" , ... ) Concat ( vf | lower (vf) | upper (vf) | "text" , ... )
where
Source_Id
This is a transform that generates a unique ID into
tgt_fld
. It should only be used in conjunction with the
AUTO-ID
option. The name of the
tgt_fld
must match that defined in
AUTO-ID-FIELD
in the
IDX-Definition
section.
Insert-Constant
This will inject a string of text into
tgt_fld
.
vf
This is the name of a virtual field defined in the
source_clause
(including the leading ’$’).
Upper
This will convert the virtual field
vf
to upper case.
Lower
This will convert the virtual field
vf
to lower case.
Insert-Field
This is a transform that will combine a number of virtual fields into
tgt_fld
. Each field is stored at the specified offset in
tgt_fld
. Offsets start from 0.
Append
This is a transform which will combine virtual fields and text by stripping trailing spaces from all fields and joining them together into
tgt_fld
.
Concat
This is the same as
Append
but will add a single space between the fields when joining them.
Select Clause
Is any valid SQL expression that can be used to select a
subset of records
from the UST. Data Clustering Engine does not parse this expression. It is simply added to the WHERE clause generated by DCE. Therefore it is important to ensure its correctness, otherwise a run-time SQL error will occur in the Loader.
Do not try to limit the number of rows loaded from the UST using a physical limit. For example,
select_by ROWNUM <= 1000
This approach instructs the SQL Optimizer to return the first 1000 rows but has the disadvantage that the "first 1000 rows" may not be the same ones when the Project is loaded a second time. It will produce inconsistent results.
Use a logical limit when selecting a subset of records from the UST. For example,
select_by EmpId >= 50000 AND EmpId <= 51000
This ensures a repeatable set of records.
NOSYNC Clause
DCE doesn’t support data source synchronisation. Only the data that is present in the UST(s) at the time the Loader is started will end up in the clustering IDT. The
NOSYNC
keyword is optional.
Example
CREATE_IDT SQLinput,1 SOURCED_FROM ora:99:scott/tiger@server734 SCOTT.EMP.EMPNO EmpNo, SCOTT.EMP.ENAME EmployeeName SELECT_BY scott.emp.empno > 7800 ;
The example will create a virtual input table named
SQLinput
. This table can then be referred to in a
logical-file-definition
as the logical input file for a clustering job:
logical-file-definition *====================== NAME= lf-input PHYSICAL-FILE= "SQLinput" COMMENT= "input from Oracle database" FORMAT= SQL AUTO-ID-NAME= JobN
Data will be extracted from an Oracle service named
server734
using the userid
scott
whose password is
tiger
. The IDT will contain two fields,
EmpNo
and
EmployeeName
. They will have identical formats and lengths as their corresponding source fields,
EMPNO
and
ENAME
taken from table
SCOTT.EMP
. Only employees with employee numbers greater than 7800 will be extracted and loaded into the IDT.

0 COMMENTS

We’d like to hear from you!