Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Back Next

Data Engineering Streaming Sizing and Tuning Recommendations

Use Informatica^® Data Engineering Streaming mappings to collect streaming data, build the business logic for the data, and push the logic to a Spark engine for processing. The Spark engine uses Spark Streaming to process data. Streaming mapping includes streaming sources such as Kafka or JMS. The Spark engine reads the data, divides the data into micro batches, and publishes it.

Streaming mappings run continuously. When you create and run a streaming mapping, a Spark application is created on the Hadoop cluster which runs forever unless killed or cancelled through the Data Integration Service. Because a batch is triggered for every micro batch interval that is configured for the mapping, consider the following recommendations:

The processing time for each batch must remain the same over the entire duration.

The batch processing time of every batch must be less than batch interval.

Tuning and Sizing Guidelines for Data Engineering Integration (10.4.x)

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal