Release Guide

Back Next

Profiles and Sampling Options on the Spark Engine

Effective in version 10.4.0, you can run profiles and choose sampling options on the Spark engine.

Profiling on the Spark engine: You can create and run profiles on the Spark engine in the Informatica Developer and Informatica Analyst tools. You can perform data domain discovery and create scorecards on the Spark engine.
Sampling options on the Spark engine: You can choose following sampling options to run profiles on the Spark engine:
Limit n
sampling option runs a profile based on the number of the rows in the data object. When you choose to run a profile in the Hadoop environment, the Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. You can not apply limit n sample options on the profiles with advance filter.
Supported on Oracle database through Sqoop connection.

Random percentage
sampling option runs a profile on a percentage of rows in the data object.

For information about the profiles and sampling options on the Spark engine, see