Application Ingestion and Replication

Back Next

Finalize the task definition

Almost done! On the

Let's Go!

page, complete a few more properties. Then you can Save and Deploy the task.

Under

General Properties

, set the following properties:

Property	Description
Task Name	Enter a name that you want to use to identify the application ingestion and replication task, if you do not want to use the generated name. Using a descriptive name will make finding the task easier later. Task names can contain Latin alphanumeric characters, spaces, periods (.), commas (,), underscores (_), plus signs (+), and hyphens (-). Task names cannot include other special characters. Task names are not case sensitive. Maximum length is 50 characters. If you include spaces in the task name, after you deploy the task, the spaces do not appear in the corresponding job name.
Location	The project or project\folder in Explore that will contain the task definition. If you do not specify a project, the "Default" project is used.
Runtime Environment	Select the runtime environment that you want to use to run the task. By default, the runtime environment that you initially entered when you began defining the task is displayed. You can use this runtime environment or select another one. To refresh the list of runtime environments, click Refresh . The runtime environment can be a Secure Agent group that consists of one or more Secure Agents. A Secure Agent is a lightweight program that runs tasks and enables secure communication. Alternatively, for application ingestion and replication initial load jobs that have selected source types, you can use a serverless runtime environment hosted on Microsoft Azure. You cannot choose a serverless runtime environment if a local runtime environment was previously selected. The Cloud Hosted Agent is not supported. Select Set as default to use the specified runtime environment as your default environment for all tasks you create. Otherwise, leave this check box cleared.
Description	Optionally, enter a description you want to use for the task. Maximum length is 4,000 characters.
Schedule	If you want to run an initial load task based on a schedule instead of manually starting it, select Run this task based on a schedule . Then select a schedule that was previously defined in Administrator. The default option is Do not run this task based on a schedule . This field is not available for incremental load and combined initial and incremental load tasks. To view and edit the schedule options, go to Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task. If the schedule criteria for running the job is met but the previous job run is still active, Application Ingestion and Replication skips the new job run.
Execute in Taskflow	Select this check box to make the task available in Data Integration to add to a taskflow as an event source.You can then include transformations in the taskflow to transform the ingested data. Available for initial load and incremental load tasks with Snowflake targets that don't use the Superpipe option.

To display advanced properties, toggle on

Show Advanced Options

Optionally, edit the value in the

Number of Rows in Output File

value to specify the maximum number of rows that the

application ingestion and replication

task writes to an output file.

The

Number of Rows in Output File

field is not displayed for jobs that have an Apache Kafka target or if you use the Superpipe option for the Snowflake target.

Valid values are 1 through 100000000. The default value for Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Infrastructure (OCI) Object Storage targets is 1000 rows. For the other targets, the default value is 100000 rows.

For incremental load and combined initial and incremental load operations, change data is flushed to the target either when the specified number of rows is reached or when the flush latency period expires and the job is not in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is set to 10 seconds and cannot be changed.

For initial load jobs only, optionally clear the

File Extension Based on File Type

check box if you want the output data files for Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, or Microsoft Fabric OneLake targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.

For incremental load jobs with these target types, this option is not available.

Application Ingestion and Replication

always uses output file-name extensions based on file type.

Optionally, configure an apply cycle. An apply cycle is a cycle of applying change data that starts with fetching the intermediate data from the source and ends with the commit of the data to the target. For continuous replication, the source processes the data in multiple low-latency apply cycles.

For

application ingestion and replication

incremental load tasks that have Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, or Microsoft Fabric OneLake targets, you can configure the following apply cycle options:

Option	Description
Apply Cycle Interval	Specifies the amount of time that must elapse before an application ingestion and replication job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank. The default value is 15 minutes.
Apply Cycle Change Limit	Specifies the number of records that must be processed before an application ingestion and replication job ends an apply cycle. When this record limit is reached, the ingestion job ends the apply cycle and writes the change data to the target. The default value is 10000 records. During startup, jobs might reach this limit more frequently than the apply cycle interval if they need to catch up on processing a backlog of older data.
Low Activity Flush Interval	Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before an application ingestion and replication job ends an apply cycle. When this time limit is reached, the ingestion job ends the apply cycle and writes the change data to the target. If you do not specify a value for this option, an application ingestion and replication job ends apply cycles only after either the Apply Cycle Change Limit or Apply Cycle Interval limit is reached. No default value is provided.

Either the

Apply Cycle Interval

Apply Cycle Change Limit

field must have a non-zero value or use the default value.

An apply cycle ends when the job reaches any of the three limits, whichever limit is met first.

For incremental load jobs that have an Apache Kafka target, configure the following Checkpoint Options:

Option	Description
Checkpoint All Rows	Indicates whether an application ingestion and replication job performs checkpoint processing for every message that is sent to the Kafka target. If this check box is selected, the Checkpoint Every Commit , Checkpoint Row Count , and Checkpoint Frequency (secs) options are ignored.
Checkpoint Every Commit	Indicates whether an application ingestion and replication job performs checkpoint processing for every commit that occurs on the source.
Checkpoint Row Count	Specifies the maximum number of messages that an application ingestion and replication job sends to the target before adding a checkpoint. If you set this option to 0, the job does not perform checkpoint processing based on the number of messages. If you set this option to 1, the job adds a checkpoint for each message.
Checkpoint Frequency (secs)	Specifies the maximum number of seconds that must elapse before an application ingestion and replication job adds a checkpoint. If you set this option to 0, an application ingestion and replication does not perform checkpoint processing based on elapsed time.

Under

Schema Drift Options

, if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.

The

Schema Drift Options

section appears only for incremental load and combined initial and incremental load tasks. Additionally, this section appears only for the sources that support automatic detection of schema changes.

The following table describes the schema drift options that you can set for a DDL operation type:

Option	Description
Ignore	Do not replicate DDL changes that occur on the source database to the target.
Replicate	Allow the application ingestion and replication job to replicate the DDL changes to the target. Add Field operations that add a primary-key field are not supported and might cause unpredictable results. Modify Field operations that change the NULL or NOT NULL constraint of a field are not replicated to the target. The types of supported DDL operations are: Add Column Modify Column Drop Column Rename Column Application ingestion and replication jobs doesn't support modifying or renaming columns for Google BigQuery target, and adding columns for Oracle targets.
Stop Job	Stop the application ingestion and replication job.
Stop Table	Stop processing the source object on which the DDL change occurred. When one or more objects are excluded from replication because of the Stop Object schema drift option, the status of the job changes to Running with Warning . The application ingestion and replication job cannot retrieve the data changes that occurred on the source object after the job stops processing the changes. This action leads to data loss on the target. To avoid data loss, you must re-synchronize the source and target objects that the job stopped processing before you resume the application ingestion and replication job.

Under

Custom Properties

, you can specify one or more custom properties that Informatica provides to meet your special requirements. To add a property, in the

Create Property

field, enter the property name and value. Then click

Add Property

Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_)

Click

Save

to save the task.

Click

Deploy

to deploy a job instance for the task, or click

View

to view or edit the task.

You can run a job that has the status of Deployed from the

My Jobs

page.

Configuring an application ingestion and replication task