Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Application Ingestion and Replication

Application Ingestion and Replication

Application Ingestion and Replication

Finalize the task definition

Finalize the task definition

Almost done! On the
Let's Go!
page, complete a few more properties. Then you can Save and Deploy the task.
  1. Under
    General Properties
    , set the following properties:
    Property
    Description
    Task Name
    Enter a name that you want to use to identify the
    application ingestion and replication
    task, if you do not want to use the generated name. Using a descriptive name will make finding the task easier later.
    Task names can contain Latin alphanumeric characters, spaces, periods (.), commas (,), underscores (_), plus signs (+), and hyphens (-). Task names cannot include other special characters. Task names are not case sensitive. Maximum length is 50 characters.
    If you include spaces in the task name, after you deploy the task, the spaces do not appear in the corresponding job name.
    Location
    The project or project\folder in Explore that will contain the task definition. If you do not specify a project, the "Default" project is used.
    Runtime Environment
    Select the runtime environment that you want to use to run the task. By default, the runtime environment that you initially entered when you began defining the task is displayed. You can use this runtime environment or select another one.
    To refresh the list of runtime environments, click
    Refresh
    .
    The runtime environment can be a Secure Agent group that consists of one or more Secure Agents. A Secure Agent is a lightweight program that runs tasks and enables secure communication.
    Alternatively, for
    application ingestion and replication
    initial load jobs that have selected source types, you can use a serverless runtime environment hosted on Microsoft Azure.
    You cannot choose a serverless runtime environment if a local runtime environment was previously selected.
    The Cloud Hosted Agent is not supported.
    Select
    Set as default
    to use the specified runtime environment as your default environment for all tasks you create. Otherwise, leave this check box cleared.
    Description
    Optionally, enter a description you want to use for the task.
    Maximum length is 4,000 characters.
    Schedule
    If you want to run an initial load task based on a schedule instead of manually starting it, select
    Run this task based on a schedule
    . Then select a schedule that was previously defined in Administrator.
    The default option is
    Do not run this task based on a schedule
    .
    This field is not available for incremental load and combined initial and incremental load tasks.
    To view and edit the schedule options, go to Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task.
    If the schedule criteria for running the job is met but the previous job run is still active,
    Application Ingestion and Replication
    skips the new job run.
    Execute in Taskflow
    Select this check box to make the task available in Data Integration to add to a taskflow as an event source.You can then include transformations in the taskflow to transform the ingested data. Available for initial load and incremental load tasks with Snowflake targets that don't use the
    Superpipe
    option.
  2. To display advanced properties, toggle on
    Show Advanced Options
    .
  3. Optionally, edit the value in the
    Number of Rows in Output File
    value to specify the maximum number of rows that the
    application ingestion and replication
    task writes to an output file.
    The
    Number of Rows in Output File
    field is not displayed for jobs that have an Apache Kafka target or if you use the Superpipe option for the Snowflake target.
    Valid values are 1 through 100000000. The default value for Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Infrastructure (OCI) Object Storage targets is 1000 rows. For the other targets, the default value is 100000 rows.
    For incremental load and combined initial and incremental load operations, change data is flushed to the target either when the specified number of rows is reached or when the flush latency period expires and the job is not in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is set to 10 seconds and cannot be changed.
  4. For initial load jobs only, optionally clear the
    File Extension Based on File Type
    check box if you want the output data files for Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, or Microsoft Fabric OneLake targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.
    For incremental load jobs with these target types, this option is not available.
    Application Ingestion and Replication
    always uses output file-name extensions based on file type.
  5. Optionally, configure an apply cycle. An apply cycle is a cycle of applying change data that starts with fetching the intermediate data from the source and ends with the commit of the data to the target. For continuous replication, the source processes the data in multiple low-latency apply cycles.
    For
    application ingestion and replication
    incremental load tasks that have Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, or Microsoft Fabric OneLake targets, you can configure the following apply cycle options:
    Option
    Description
    Apply Cycle Interval
    Specifies the amount of time that must elapse before an
    application ingestion and replication
    job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank.
    The default value is 15 minutes.
    Apply Cycle Change Limit
    Specifies the number of records that must be processed before an
    application ingestion and replication
    job ends an apply cycle. When this record limit is reached, the ingestion job ends the apply cycle and writes the change data to the target.
    The default value is 10000 records.
    During startup, jobs might reach this limit more frequently than the apply cycle interval if they need to catch up on processing a backlog of older data.
    Low Activity Flush Interval
    Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before an
    application ingestion and replication
    job ends an apply cycle. When this time limit is reached, the ingestion job ends the apply cycle and writes the change data to the target.
    If you do not specify a value for this option, an
    application ingestion and replication
    job ends apply cycles only after either the
    Apply Cycle Change Limit
    or
    Apply Cycle Interval
    limit is reached.
    No default value is provided.
    • Either the
      Apply Cycle Interval
      or
      Apply Cycle Change Limit
      field must have a non-zero value or use the default value.
    • An apply cycle ends when the job reaches any of the three limits, whichever limit is met first.
  6. For incremental load jobs that have an Apache Kafka target, configure the following Checkpoint Options:
    Option
    Description
    Checkpoint All Rows
    Indicates whether an
    application ingestion and replication
    job performs checkpoint processing for every message that is sent to the Kafka target.
    If this check box is selected, the
    Checkpoint Every Commit
    ,
    Checkpoint Row Count
    , and
    Checkpoint Frequency (secs)
    options are ignored.
    Checkpoint Every Commit
    Indicates whether an
    application ingestion and replication
    job performs checkpoint processing for every commit that occurs on the source.
    Checkpoint Row Count
    Specifies the maximum number of messages that an
    application ingestion and replication
    job sends to the target before adding a checkpoint. If you set this option to 0, the job does not perform checkpoint processing based on the number of messages. If you set this option to 1, the job adds a checkpoint for each message.
    Checkpoint Frequency (secs)
    Specifies the maximum number of seconds that must elapse before an
    application ingestion and replication
    job adds a checkpoint. If you set this option to 0, an
    application ingestion and replication
    does not perform checkpoint processing based on elapsed time.
  7. Under
    Schema Drift Options
    , if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.
    The
    Schema Drift Options
    section appears only for incremental load and combined initial and incremental load tasks. Additionally, this section appears only for the sources that support automatic detection of schema changes.
    The following table describes the schema drift options that you can set for a DDL operation type:
    Option
    Description
    Ignore
    Do not replicate DDL changes that occur on the source database to the target.
    Replicate
    Allow the
    application ingestion and replication
    job to replicate the DDL changes to the target.
    • Add Field operations that add a primary-key field are not supported and might cause unpredictable results.
    • Modify Field operations that change the NULL or NOT NULL constraint of a field are not replicated to the target.
    The types of supported DDL operations are:
    • Add Column
    • Modify Column
    • Drop Column
    • Rename Column
    Application ingestion and replication
    jobs doesn't support modifying or renaming columns for Google BigQuery target, and adding columns for Oracle targets.
    Stop Job
    Stop the
    application ingestion and replication
    job.
    Stop Table
    Stop processing the source object on which the DDL change occurred.
    When one or more objects are excluded from replication because of the Stop Object schema drift option, the status of the job changes to
    Running with Warning
    . The
    application ingestion and replication
    job cannot retrieve the data changes that occurred on the source object after the job stops processing the changes. This action leads to data loss on the target. To avoid data loss, you must re-synchronize the source and target objects that the job stopped processing before you resume the
    application ingestion and replication
    job.
  8. Under
    Custom Properties
    , you can specify one or more custom properties that Informatica provides to meet your special requirements. To add a property, in the
    Create Property
    field, enter the property name and value. Then click
    Add Property
    .
    Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_)
  9. Click
    Save
    to save the task.
  10. Click
    Deploy
    to deploy a job instance for the task, or click
    View
    to view or edit the task.
    You can run a job that has the status of Deployed from the
    My Jobs
    page.

0 COMMENTS

We’d like to hear from you!