Release Guide

Back Next

Data Discovery and Sampling Options on the Spark Engine

Effective in version 10.4.0, you can run profiles to discover data domains and choose sampling options on the Spark Engine.

Data Domain Discovery on the Spark Engine: You can perform data domain discovery on the Spark engine.

Sampling Options on the Spark Engine: You can choose the following sampling options to discover data domains on the Spark engine:
Limit n
sampling option runs a profile based on the number of the rows in the data object. When you choose to discover data domains in the Hadoop environment, the Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size.
Random percentage
sampling option runs a profile on a percentage of rows in the data object.

For more information, see the

Enterprise Data Catalog Concepts

chapter in the

Informatica 10.4.0 Enterprise Catalog Administrator Guide.

Watch

Comments