Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Implementing Oracle to Snowflake Synchronization Using Cloud Mass Ingestion Databases

Implementing Oracle to Snowflake Synchronization Using Cloud Mass Ingestion Databases

Step 2: Configuring the Oracle source

Step 2: Configuring the Oracle source

Configure the Oracle source on the
Source
page of the
database ingestion
task wizard.
  1. In the
    Connection
    list, select the connection for the Oracle source that you configured and tested in
    Administrator
    for a runtime environment that your organization uses. The connection type appears in parentheses after the connection name.
    The list includes only the connection types that are valid for the load type selected on the
    Definition
    page. No connections are listed if you did not select a load type.
    After you deploy the ingestion task, you cannot change the connection without first undeploying the associated ingestion job. After you change the connection, you must deploy the task again.
  2. In the
    Schema
    list, select the source schema that includes the source tables.
    When creating a task that has a Oracle source, the schema name that is specified in the connection properties is displayed by default.
  3. Under
    Change Data Capture Method
    , set the
    CDC Method
    field to
    Log-based CDC
    .
    In this CDC method, Inserts, Updates, Deletes, and column DDL changes are captured from the Oracle redo logs.
  4. Under
    Table Selection
    , select
    Select All
    to select all tables and columns in the schema for data replication. The
    Tables Selected
    field shows the count of all selected tables. You won't be able to edit the selection later under
    Table View
    .
    Fetching information for all of the tables might take a long time.
    Alternatively, you can use rules to define a subset of source tables to replicate.
  5. To use rules to select the source tables, make sure that the
    Select All
    check box is cleared and then add rules.
    When rule-based selection is used, you can refine the set of selected tables by table under
    Table View
    and also set an option for trimming spaces in character data.
    The default "Include *" rule selects all tables in the specified schema. To see how many tables are selected by this rule, click the Refresh icon to display the table count in
    Total Tables Selected
    and click
    Apply Rules
    to see the table count in
    Table View
    .
    To add a rule:
    1. Under
      Rules
      , click the Add Rule (+) icon above the first table. A row is added to the table.
    2. In the
      Table Rule
      column, select
      Include
      or
      Exclude
      to create an inclusion or exclusion rule, respectively.
    3. In the
      Condition
      column, enter a table name or a table-name mask that includes one or more wildcards to identify the source tables to include in or exclude from table selection. Use the following guidelines:
      • A mask can contain one or both of the following wildcards: an asterisk (*) wildcard to represent one or more characters and a question mark (?) wildcard to represent a single character. A wildcard can occur multiple times in a mask value and can occur anywhere in the value.
      • The task wizard is case sensitive. Enter the table names or masks in the case with which the tables were defined.
      • Do not include delimiters such as quotation marks or brackets, even if the source database uses them. For example, Oracle requires quotation marks around lowercase and mixed-case names to force the names to be stored in lowercase or mixed case. However, in the task wizard, you must enter the lowercase or mixed-case names without quotation marks.
      • If a table name includes special characters such as a backslash (\), asterisk(*), dollar sign ($), caret (^), or question mark (?), escape each special character with a backslash (\) when you enter the rule.
    4. Define additional rules as needed.
      The rules will be processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order.
    5. When finished, click
      Apply Rules
      .
      The
      Total Tables Selected
      and
      Tables Affected
      counts are updated.
      The following image shows multiple rules defined on the
      Source
      page:
      After you apply rules, if you add, delete, or change rules, you must click
      Apply Rules
      again. Click the Refresh icon to update the table counts. If you delete all rules without clicking
      Apply Rules
      , a validation error will occur at deployment, even if the
      Table View
      list still lists tables. If you switch to
      Select All
      , the rules are no longer in effect and disappear.
  6. To perform trim actions on character columns in the source tables selected based on rules, create column action rules.
    To create a column action rule:
    1. Click the Add Rule (+) icon above the second table.
    2. In the
      Action
      column, select one of the following options:
      • LTRIM
        . Trims spaces to the left of character column values.
      • RTRIM
        . Trims spaces to the right of character column values.
      • TRIM
        . Trims spaces to the left of and to the right of character column values.
    3. In the
      Condition
      column, enter a column name or a column name mask that includes one or more asterisk (*) or question mark (?) wildcards. The value is matched against columns in the selected source tables to identify the columns to which the action applies.
    You can define multiple rules for different action types or for the same action type with different conditions. The rules are processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order.
  7. Under
    Table View
    , view or edit the set of selected source tables and columns.
    If you selected
    Select All
    , the lists of tables and columns are view only.
    If you applied rules, you can refine the set of selected tables by clicking the check box next to individual tables. Deselect any tables that you do not want to replicate, or select additional items to replicate. Click the Refresh icon to update the selected tables count.
    You can also individually deselect or reselect the columns in a selected source table. To view or change the columns from which data will be replicated for a selected table, click the highlighted number of columns in the
    Columns
    column. The column names and data types are displayed to the right. By default, all columns are selected for a selected source table. To deselect a column or reselect it, click the check box next to the column name. You cannot deselect a primary key column.
    The following image shows selected tables and the selected columns for the first table:
    Notes:
    • To search for tables or columns or both, in the drop-down list above the columns list, select
      Table Name
      ,
      Columns
      , or
      All
      and then enter a search string in the
      Find
      box and click
      Search
      . You can include a single asterisk (*) wildcard at the beginning or end of the string.
    • The first time you change a check box setting for a table or column, the rules are no longer in effect. The selections under
      Table View
      take precedence. However, if you click the Add Rule (+) icon again, any tables that you deselected or selected individually are reflected as new rules in the Rules list and the rules once again take precedence. If you want to return to the
      Table View
      list, click
      Apply Rules
      again.
    • In the
      Table View
      section, the length of character columns is shown in bytes. Since the number of bytes per character differs depending on the character set encoding used by the database, for some sources, the actual column length in number of characters might be different than the number of bytes shown in the
      Table View
      .
    • If you select columns individually, the resulting set of columns is fixed and will not be updated by any schema change, regardless of schema drift settings. For example, if a source column is added or renamed, that column is silently excluded from CDC processing because it's not in the list of selected columns. However, if a selected column is dropped on the source, the schema drift Drop Column option controls how it's handled. The dropped column operation is not reflected in the list of columns until you apply rules again.
  8. If one or more of the selected source tables are not enabled for change data capture, you can generate a script for enabling CDC and then run or download the script.
    1. In the
      CDC Script
      field, select one of the following options:
      • Enable CDC for all columns
        . Enables CDC for all columns in the selected source tables.
        For source tables without a primary key, including any tables with unique indexes, CDC is enabled for all columns by default, regardless of which option is selected.
      • Enable CDC for primary key columns
        . Enables CDC only for primary key columns in the selected source tables.
      The script enables supplemental logging for all or primary key columns in the selected source tables to log additional information in the redo logs.
    2. To run the script, click
      Execute
      .
      If you do not have a database role or privilege that allows you to run the script, click the Download icon to download the script. The script file name has the following format:
      cdc_script_
      taskname
      _
      number
      .txt
      . Then ask your database administrator to run the script.
      Make sure the script runs before you run the database ingestion task.
    If you change to the
    CDC Script
    option later and run the script again, the script first drops CDC for the original set of columns and then enables CDC for the current set of columns.
  9. To create and download a list of the source tables that match the table selection criteria, perform the following substeps:
    1. If you used rule-based table selection, in the
      List Tables by Rule Type
      list, select the type of selection rules that you want to use. Options are:
      • Include Rules Only
      • Exclude Rules Only
      • Include And Exclude Rules
    2. To include columns in the list, regardless of which table selection method you used, select the
      Include Columns
      check box.
    3. Click the Download icon.
      A downloaded list that includes columns has the following format:
      status
      ,
      schema_name
      ,
      table_name
      ,
      object_type
      ,
      column_name
      ,
      comment
      The following table describes the information that is displayed in the downloaded list:
      Field
      Description
      status
      Indicates whether Mass Ingestion Databases excludes the source table or column from processing because it has an unsupported type. Valid values are:
      • E
        . The object is excluded from processing by an Exclude rule.
      • I
        . The object is included in processing.
      • X
        . The object is excluded from processing because it is an unsupported type of object. For example, unsupported types of objects include columns with unsupported data types and tables that include only unsupported columns. The comment field provides detail on unsupported types.
      schema_name
      Specifies the name of the source schema.
      table_name
      Specifies the name of the source table.
      object_type
      Specifies the type of the source object. Valid values are:
      • C
        . Column.
      • T
        . Table.
      column_name
      Specifies the name of the source column. This information appears only if you selected the
      Columns
      check box.
      comment
      Specifies the reason why a source object of an unsupported type is excluded from processing even though it matches the selection rules.
  10. Under
    Advanced
    , set the advanced properties that are available for combined initial and incremental load jobs with Oracle sources.
    Property
    Description
    Enable Persistent Storage
    Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
    Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
    Enable Partitioning
    Select this check box to enable partitioning of source objects. When an object is partitioned, the
    database ingestion
    job processes the records read from each partition in parallel.
    For Oracle sources,
    Mass Ingestion Databases
    determines the range of partitions by using the ROWID as the partition key.
    In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase.
    Number of Partitions
    If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2.
    Initial Start Point for Incremental Load
    Set this field if you want to customize the position in the source logs from which the
    database ingestion
    job starts reading change records the first time it runs.
    Options are:
    • Latest Available
      . The latest available position in the database log or structure.
    • Position
      . A valid Oracle SCN that is used to determine a position in the change stream from which to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The default value is 0, which causes processing to start from the latest available position.
    • Specific Date and Time
      . A date and time, in the format MM/DD/YYYY
      hh:mm AM|PM
      , that
      Mass Ingestion Databases
      uses to determine the position in the change stream from which to start retrieving change records.
      Mass Ingestion Databases
      retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    The default is
    Latest Available
    .
    • The
      Initial Start Point for Incremental Load
      option pertains only to the initial run of a job. Thereafter, if you resume a stopped or aborted job, the job begins propagating source data from where it last left off.
    • For combined initial and incremental load jobs, initial loading is not performed until the incremental processing of change data reaches the end of the current transaction log. For this reason, Informatica recommends that you select
      Latest Available
      as the start point.
      For Oracle combined initial and incremental load jobs, Oracle Flashback queries are used to get committed data that was current at a specific point in the change stream. Ensure that no source table is truncated during the initial load period. If truncation occurs, any DDL change performed during a flashback query causes the query to fail.
  11. Under
    Custom Properties
    , you can specify custom properties at the direction of Informatica Global Customer Support to meet your special requirements. In this scenario, skip this step.
  12. Click
    Next
    .

0 COMMENTS

We’d like to hear from you!