User Guide

10.5
- 10.5 HotFix 3
- 10.5 HotFix 2
- 10.5 HotFix 1
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

Single UST

The simplest form of IDT is created from column values extracted from a single User Source Table. The general form of the syntax is:


logical-file-definition
*======================
NAME=               lf-input
PHYSICAL-FILE=      "VirtualTable"
COMMENT=            "input from SQL database"
FORMAT=             SQL
...
Section: User-Source-Tables

CREATE_IDT VirtualTable,FileNo
SOURCED_FROM [connection_string]
source_clause
[TRANSFORM transform_clause]
[SELECT_BY select_clause]
[NOSYNC]
;

where

VirtualTable: This is the name specified by the
PHYSICAL-FILE
parameter of the Logical-File-Definition.

FileNo: This is a unique file number associated with this IDT. It must be in the range 1-14999, although some
FileNo
s are reserved for internal use (2000-2010).

Connection String

The

connection_string

is used to specify connection information for access to the UST on the source database.

The format of the

connection_string

is as follows:

odb:99:Userid/Password@Service

where

Userid: DBMS user-id

Password: DBMS password

Service: This is the
Service_Name
defined in your
odbc.ini
file. Refer to the
Configuring ODBC
section for details.

For example,
odb:99:scott/tiger@ora920
specifies an Oracle host DBMS. DCE will connect to the DBMS identified as
"ora920"
using the user id "scott" and a password of "tiger". Any valid userid with SELECT privileges on the source tables may be used.

Source Clause

This section is used to specify the source of the data. The

source_clause

is used to nominate the UST columns that are to be extracted when creating the IDT. Source fields can be either added directly into the IDT or used in a transformation process, the result of which is added to the IDT.

The

source_clause

syntax is:

[src_schema.]table.column [tgt_fld [fmt(len)]] [,...]

where

src_schema: This is the name of the schema that the
table.column
belongs to. The default value is your userid.

table: The source table name that contains the column.

Oracle:: Synonyms (private, public, recursive) may be used but are converted to the real
schema.table
name during parsing.

column: The name of the source column to be extracted.

tgt_fld: The name of the IDT field. If omitted, it defaults to the value of
column
. If
tgt_fld
begins with a $, it is treated as a
virtual field
. Virtual fields are not stored in the IDT. They can only be referenced in a
transform_clause
.

fmt(len): The format and length of the
tgt_fld
in the IDT. When omitted, they default to the format and length of the source column. Valid formats are defined in the
File Definition
section of this manual.

Transform Clause

transform_clause

is really an optional part of a

source_clause

merge_clause

. It is used to specify how virtual fields are to be combined/transformed and characteristics of the resulting field stored in the IDT.

The

transform_clause

syntax is:

transform_rule tgt_fld fmt(len) [order n] [,...]

where

transform_rule: Nominates the transformation process which will convert virtual fields into a value to be stored in
tgt_fld
.

tgt_fld: This is the name of the field to be stored in the IDT.

fmt(len): This is the format and length of the
tgt_fld
.

order n: This is used to override the default order of fields in the IDT. Normally fields are placed in the IDT in the order of definition in the
source_clause
and
transform_clause
. You may override this order by nominating an integer value for n, starting from 1.

Transform Rules

Transform rules fall into three categories:

rules that use no virtual fields (

Source-Id

Insert-Constant

)

rules that operate on a single virtual field (

Upper

Lower

)

rules that operate on many fields (

Append

Concat

Insert-Field

)

Each virtual field can only be referenced once in the

transform_clause

. In the unlikely event that the same source column is to be used in more than one transform clause, add an extra

source_clause

for that column but give it a different virtual field name.

transform_rule

is one of the following:


Source-Id
Insert-Constant ("text")
Upper (vf)
Lower (vf)
Insert-Field (vf, offset, ...)
Append ( vf | lower (vf) | upper (vf) | "text" , ... )
Concat ( vf | lower (vf) | upper (vf) | "text" , ... )

where

Source_Id: This is a transform that generates a unique ID into
tgt_fld
. It should only be used in conjunction with the
AUTO-ID
option. The name of the
tgt_fld
must match that defined in
AUTO-ID-FIELD
in the
IDX-Definition
section.

Insert-Constant: This will inject a string of text into
tgt_fld
.

vf: This is the name of a virtual field defined in the
source_clause
(including the leading ’$’).

Upper: This will convert the virtual field
vf
to upper case.

Lower: This will convert the virtual field
vf
to lower case.

Insert-Field: This is a transform that will combine a number of virtual fields into
tgt_fld
. Each field is stored at the specified offset in
tgt_fld
. Offsets start from 0.

Append: This is a transform which will combine virtual fields and text by stripping trailing spaces from all fields and joining them together into
tgt_fld
.

Concat: This is the same as
Append
but will add a single space between the fields when joining them.

Select Clause

Is any valid SQL expression that can be used to select a subset of records
from the UST. Data Clustering Engine does not parse this expression. It is simply added to the WHERE clause generated by DCE. Therefore it is important to ensure its correctness, otherwise a run-time SQL error will occur in the Loader.

Do not try to limit the number of rows loaded from the UST using a physical limit. For example,

select_by ROWNUM <= 1000

This approach instructs the SQL Optimizer to return the first 1000 rows but has the disadvantage that the "first 1000 rows" may not be the same ones when the Project is loaded a second time. It will produce inconsistent results.

Use a logical limit when selecting a subset of records from the UST. For example,

select_by EmpId >= 50000 AND EmpId <= 51000

This ensures a repeatable set of records.

NOSYNC Clause

DCE doesn’t support data source synchronisation. Only the data that is present in the UST(s) at the time the Loader is started will end up in the clustering IDT. The

NOSYNC

keyword is optional.

Example


CREATE_IDT
        SQLinput,1
SOURCED_FROM ora:99:scott/tiger@server734
        SCOTT.EMP.EMPNO   EmpNo,
        SCOTT.EMP.ENAME   EmployeeName
SELECT_BY
        scott.emp.empno > 7800
;

The example will create a virtual input table named

SQLinput

. This table can then be referred to in a

logical-file-definition

as the logical input file for a clustering job:


logical-file-definition
*======================
NAME=           lf-input
PHYSICAL-FILE=  "SQLinput"
COMMENT=        "input from Oracle database"
FORMAT=         SQL
AUTO-ID-NAME=   JobN

Data will be extracted from an Oracle service named

server734

using the userid

scott

whose password is

tiger

. The IDT will contain two fields,

EmpNo

and

EmployeeName

. They will have identical formats and lengths as their corresponding source fields,

EMPNO

and

ENAME

taken from table

SCOTT.EMP

. Only employees with employee numbers greater than 7800 will be extracted and loaded into the IDT.

Rename Saved Search

Table of Contents

User Guide

User Guide

Single UST

Single UST