Step 2: Configuring the Db2 for i source

In the

Connection

list, select the connection for the Db2 for i source that you configured and tested in

Administrator

for a runtime environment that your organization uses. The connection type appears in parentheses after the connection name.

The list includes only the connection types that are valid for the load type selected on the

Definition

page. No connections are listed if you did not select a load type.

After you deploy the ingestion task, you cannot change the connection without first undeploying the associated ingestion job. After you change the connection, you must deploy the task again.

In the

Schema

list, select the source schema that includes the source tables.

The list includes only the schemas that are available in the database accessed with the specified source connection.

In the

Journal Name

field, select the name of the journal that records the changes made to the source tables.

Under

Table Selection

, select

Select All

only if you want to select all tables and columns in the schema for data replication. You cannot edit the selection in subsequent fields.

The

Tables Selected

field shows the count of all selected tables. The interface fetches information about all of the tables.

Fetching information for all of the tables might take a long time.

The following image shows the

Source

page when

Select All

is selected:

Alternatively, you can use rules to define a subset of source tables to replicate.

To use rules to select the source tables, make sure that the

Select All

check box is cleared and then add rules.

When rule-based selection is used, you can refine the set of selected tables by table under

Table View

and also set an option for trimming spaces in character data.

The default "Include *" rule selects all tables in the specified schema. To see how many tables are selected by this rule, click the Refresh icon to display the table count in

Total Tables Selected

and click

Apply Rules

to see the table count in

Table View

.

To add a rule:

Under

Rules

, click the Add Rule (+) icon above the first table. A row is added to the table.

In the

Table Rule

column, select

Include

or

Exclude

to create an inclusion or exclusion rule, respectively.

In the

Condition

column, enter a table name or a table-name mask that includes one or more wildcards to identify the source tables to include in or exclude from table selection. Use the following guidelines:

A mask can contain one or both of the following wildcards: an asterisk (*) wildcard to represent one or more characters and a question mark (?) wildcard to represent a single character. A wildcard can occur multiple times in a mask value and can occur anywhere in the value.

The task wizard is case sensitive. Enter the table names or masks in the case with which the tables were defined.

Do not include delimiters such as quotation marks or brackets, even if the source database uses them.

If a table name includes special characters such as a backslash (\), asterisk(*), dollar sign ($), caret (^), or question mark (?), escape each special character with a backslash (\) when you enter the rule.

Define additional rules as needed.

The rules will be processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order.

When finished, click

Apply Rules

.

The

Total Tables Selected

and

Tables Affected

counts are updated.

The following image shows multiple rules defined on the

Source

page:

After you apply rules, if you add, delete, or change rules, you must click

Apply Rules

again. Click the Refresh icon to update the table counts. If you delete all rules without clicking

Apply Rules

, a validation error will occur at deployment, even if the

Table View

list still lists tables. If you switch to

Select All

, the rules are no longer in effect and disappear.

To perform trim actions on character columns in the source tables selected based on rules, create column action rules.

To create a column action rule:

Click the Add Rule (+) icon above the second Action table under

Rules

.

In the

Action

column, select one of the following options:

LTRIM

. Trims spaces to the left of character column values.

RTRIM

. Trims spaces to the right of character column values.

TRIM

. Trims spaces to the left of and to the right of character column values.

In the

Condition

column, enter a column name or a column name mask that includes one or more asterisk (*) or question mark (?) wildcards. The value is matched against columns in the selected source tables to identify the columns to which the action applies.

You can define multiple rules for different action types or for the same action type with different conditions. The rules are processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order.

Under

Table View

, view or edit the set of selected source tables and columns.

If you selected

Select All

, the lists of tables and columns are view only.

If you applied rules, you can refine the set of selected tables by clicking the check box next to individual tables. Deselect any tables that you do not want to replicate, or select additional items to replicate. Click the Refresh icon to update the selected tables count.

The following image shows selected tables:

Notes:

To search for tables or columns or both, in the drop-down list above the columns list, select

Table Name

,

Columns

, or

All

and then enter a search string in the Find
box and click

Search

. You can include a single asterisk (*) wildcard at the beginning or end of the string.

The first time you change a check box setting for a table or column, the rules are no longer in effect. The selections under

Table View

take precedence. However, if you click the Add Rule (+) icon again, any tables that you deselected or selected individually are reflected as new rules in the Rules list and the rules once again take precedence. If you want to return to the

Table View

list, click

Apply Rules

again.

In the

Table View

section, the length of character columns is shown in bytes. Since the number of bytes per character differs depending on the character set encoding used by the database, for some sources, the actual column length in number of characters might be different than the number of bytes shown in the

Table View

.

If one or more of the selected source tables are not enabled for change data capture, you can generate a script for enabling CDC and then run or download the script.

In the

CDC Script

field, select

Enable CDC for all columns

to enable CDC for all columns in the selected source tables. This option is the only available option for Db2 for i sources.

The script enables journaling on the Db2 for i source tables.

To run the script, click

Execute

.

If you do not have a database role or privilege that allows you to run the script, click the Download icon to download the script. The script file name has the following format:

cdc_script_taskname_number.txt

. Then ask your database administrator to run the script.

Make sure the script runs before you run the database ingestion task.

If you change to the

CDC Script

option later and run the script again, the script first drops CDC for the original set of columns and then enables CDC for the current set of columns.

To create and download a list of the source tables that match the table selection criteria, perform the following substeps:

If you used rule-based table selection, in the

List Tables by Rule Type

list, select the type of selection rules that you want to use. Options are:

Include Rules Only

Exclude Rules Only

Include And Exclude Rules

To include columns in the list, regardless of which table selection method you used, select the

Include Columns

check box.

Click the Download icon.

A downloaded list that includes columns has the following format:

status,schema_name,table_name,object_type,column_name,comment

The following table describes the information that is displayed in the downloaded list:

Field	Description
status	Indicates whether Mass Ingestion Databases excludes the source table or column from processing because it has an unsupported type. Valid values are: E . The object is excluded from processing by an Exclude rule. I . The object is included in processing. X . The object is excluded from processing because it is an unsupported type of object. For example, unsupported types of objects include columns with unsupported data types and tables that include only unsupported columns. The comment field provides detail on unsupported types.
schema_name	Specifies the name of the source schema.
table_name	Specifies the name of the source table.
object_type	Specifies the type of the source object. Valid values are: C . Column. T . Table.
column_name	Specifies the name of the source column. This information appears only if you selected the Columns check box.
comment	Specifies the reason why a source object of an unsupported type is excluded from processing even though it matches the selection rules.

Under

Advanced

, set the advanced properties that are available for incremental load jobs with Db2 for i sources:

Property	Description
Enable Persistent Storage	Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed. Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
Initial Start Point for Incremental Load	Set this field if you want to customize the position in the source logs from which the database ingestion job starts reading change records the first time it runs. Options are: Earliest Available . The start of the current journal with changes made to the Db2 for i source table. Latest Available . The latest available position in the database log. Position . A valid timestamp that is used to determine a position in the change stream from which to start retrieving change records. The timestamp must be equal to or less than the current timestamp. An invalid value will cause the job to fail. For Db2 for i sources, do not use the default value of 0. Specific Date and Time . A date and time, in the format MM/DD/YYYY hh:mm AM\|PM , that Mass Ingestion Databases uses to determine the position in the change stream from which to start retrieving change records. Mass Ingestion Databases retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail. The default is Latest Available . The Initial Start Point for Incremental Load option pertains only to the initial run of a job. Thereafter, if you resume a stopped or aborted job, the job begins propagating source data from where it last left off.

Property

Description

Enable Persistent Storage

Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.

Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.

Initial Start Point for Incremental Load

Set this field if you want to customize the position in the source logs from which the

database ingestion

job starts reading change records the first time it runs.

Options are:

Earliest Available

. The start of the current journal with changes made to the Db2 for i source table.

Latest Available

. The latest available position in the database log.

Position

. A valid timestamp that is used to determine a position in the change stream from which to start retrieving change records. The timestamp must be equal to or less than the current timestamp. An invalid value will cause the job to fail. For Db2 for i sources, do not use the default value of 0.

Specific Date and Time

. A date and time, in the format MM/DD/YYYY

hh:mm AM|PM

, that

Mass Ingestion Databases

uses to determine the position in the change stream from which to start retrieving change records.

Mass Ingestion Databases

retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.

The default is

Latest Available

.

The

Initial Start Point for Incremental Load

option pertains only to the initial run of a job. Thereafter, if you resume a stopped or aborted job, the job begins propagating source data from where it last left off.

Under

Custom Properties

, you can specify custom properties at the direction of Informatica Global Customer Support to meet your special requirements. In this scenario, skip this step.

Click

Table of Contents

Ingesting Db2 for i Change Data into Snowflake with Cloud Mass Ingestion Databases

Ingesting Db2 for i Change Data into Snowflake with Cloud Mass Ingestion Databases