Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Application Ingestion and Replication

Application Ingestion and Replication

Application Ingestion and Replication

Configure a Microsoft Azure Data Lake Storage Gen2 target

Configure a Microsoft Azure Data Lake Storage Gen2 target

Define target properties for the destination that you selected on the
Destination
page.
  1. Under
    Target Properties
    , define the following required Microsoft Azure Data Lake Storage Gen2 target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • CSV
    • AVRO
    • PARQUET
    The default value is
    CSV
    .
    Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If
    CSV
    is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Parquet Compression Type
    If the
    PARQUET
    output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • None
    • Gzip
    • Snappy
    The default value is
    None
    , which means no compression is used.
    Avro Format
    If you selected
    AVRO
    as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • Avro-Flat
      . This Avro schema format lists all Avro fields in one record.
    • Avro-Generic
      . This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • Avro-Nested
      . This Avro schema format organizes each type of information in a separate record.
    The default value is
    Avro-Flat
    .
    Avro Serialization Format
    If
    AVRO
    is selected as the output format, select the serialization format of the Avro output file. Options are:
    • None
    • Binary
    • JSON
    The default value is
    Binary
    .
    Avro Schema Directory
    If
    AVRO
    is selected as the output format, specify the local directory where
    Application Ingestion and Replication
    stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname
    _
    tablename
    .txt
    If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • None
    • Deflate
    • Gzip
    • Snappy
    The default value is
    None
    , which means no compression is used.
    Avro Compression Type
    If
    AVRO
    is selected as the output format, select an Avro compression type. Options are:
    • None
    • Bzip2
    • Deflate
    • Snappy
    The default value is
    None
    , which means no compression is used.
    Deflate Compression Level
    If
    Deflate
    is selected in the
    Avro Compression Type
    field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the
    Connection Directory as Parent
    option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Data Directory
    For initial load tasks
    , define a directory structure for the directories where
    Application Ingestion and Replication
    stores output data files and optionally stores the schema.
    The default directory pattern is
    {TableName)_{Timestamp}
    .
    To customize the directory pattern, click the Edit icon to select from the following listed path types and values:
    • Folder Path
      . Enter a folder name or use variables to create a folder name.
    • Timestamp values
      . Select data elements
      Timestamp
      ,
      yy
      ,
      yyyy
      ,
      mm
      , or
      dd
      . The Timestamp values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • Schema Name
      . Select
      SchemaName
      ,
      toUpper(SchemaName)
      , or
      toLower(SchemaName)
      .
    • Table Name
      . Select
      TableName
      ,
      toUpper(TableName)
      , and
      toLower(TableName)
      .
    If you manually enter the directory expression, ensure that you enclose placeholders with curly brackets { }. Placeholder values are not case sensitive.
    For example:
    myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    For incremental load and combined initial and incremental load tasks
    , define a custom path to the subdirectory that contains the cdc-data data files.
    The default directory pattern is
    {TaskTargetDirectory}/data/{TableName}/data
    To customize the directory pattern, click the Edit icon to select from the following listed path types and values:
    • Folder Path
      . Enter {TaskTargetDirectory} for a task-specific base directory on the target to use instead of the S3 folder path specified in the connection properties.
    • Timestamp values
      . Select data elements
      Timestamp
      ,
      yy
      ,
      yyyy
      ,
      mm
      , or
      dd
      . The Timestamp values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • Schema Name
      . Select
      SchemaName
      ,
      toUpper(SchemaName)
      , or
      toLower(SchemaName)
      .
    • Table Name
      . Select
      TableName
      ,
      toUpper(TableName)
      , and
      toLower(TableName)
      .
    For Amazon S3 and Microsoft Azure Data Lake Storage Gen2 targets,
    Application Ingestion and Replication
    uses the directory specified in the target connection properties as the root for the data directory path when
    Connection Directory as Parent
    is selected. For Google Cloud Storage targets,
    Application Ingestion and Replication
    uses the
    Bucket
    name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the
    Lakehouse Path
    field in the Microsoft Fabric OneLake connection properties. For Amazon S3 targets with Open Table format, the data directory field is not applicable. Enabling the
    Connection Directory as Parent
    includes the connection directory before the warehouse base path. If disabled, files are saved directly under the warehouse base directory.
    Connection Directory as Parent
    Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the
    Data Directory
    and
    Schema Directory
    . For incremental load and combined initial and incremental load tasks, the parent directory is used in the
    Data Directory
    ,
    Schema Directory
    ,
    Cycle Completion Directory
    , and
    Cycle Contents Directory
    .
    This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the
    Data Directory
    field. For incremental loads, optionally specify a root directory for the task in the
    Task Target Directory
    .
    Schema Directory
    Specify a custom directory to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is
    {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the
    Data Directory
    field. If you manually enter placeholders, ensure that you enclose them with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose both the function and placeholder in curly brackets, for example:
    {toLower(SchemaName)}
    Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is
    {TaskTargetDirectory}/cycle/completed
    .
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is
    {TaskTargetDirectory}/cycle/contents
    .
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If
    Use Cycle Partitioning for Summary Directories
    is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If
    Use Cycle Partitioning for Data Directory
    is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
  2. To view advanced properties, toggle on
    Show Advanced Options
    . Then under
    Advanced Target Properties
    , define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target.
    For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
    By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
  3. Under
    Table Renaming Rules
    , if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    • If you enter the wildcard for a source table mask, you must also enter the wildcard for a target table mask.
    • If a table name includes special characters, such as a backslash (\), asterisk(*), dot (.), or question mark (?), escape each special character in the name with a backslash (\).
    • On Windows, if you enter target table renaming criteria that causes a target table name to exceed 232 characters in length, the name is truncated to 222 characters.
      Data Ingestion and Replication
      appends 14 characters to the name to add a date-time yyyyMMddHHmmss value, which causes the name to exceed the Windows maximum limit of 255. Ensure that the names of any renamed target tables will not exceed 232 characters.
  4. Under
    Custom Properties
    , you can specify one or more custom properties that Informatica provides to meet your special requirements. To add a property, click the + icon to add a row. In the
    Property Name
    field, select the
    Custom
    option and manuallly enter both the property name and value.
    Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
  5. Click
    Next
    to proceed, or click
    Save
    .

0 COMMENTS

We’d like to hear from you!