Table of Contents

Search

  1. Preface
  2. Part 1: Version 10.4.0
  3. Part 2: Version 10.2.2
  4. Part 3: Version 10.2.1
  5. Part 4: Version 10.2
  6. Part 5: Version 10.1.1
  7. Part 6: Version 10.1

Data Discovery and Sampling Options on the Spark Engine

Data Discovery and Sampling Options on the Spark Engine

Effective in version 10.4.0, you can run profiles to discover data domains and choose sampling options on the Spark Engine.
Data Domain Discovery on the Spark Engine
You can perform data domain discovery on the Spark engine.
Sampling Options on the Spark Engine
You can choose the following sampling options to discover data domains on the Spark engine:
  • Limit n
    sampling option runs a profile based on the number of the rows in the data object. When you choose to discover data domains in the Hadoop environment, the Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size.
  • Random percentage
    sampling option runs a profile on a percentage of rows in the data object.
For more information, see the
Enterprise Data Catalog Concepts
chapter in the
Informatica 10.4.0 Enterprise Catalog Administrator Guide.

0 COMMENTS

We’d like to hear from you!