Data Discovery and Sampling Options on the Spark Engine
Data Discovery and Sampling Options on the Spark Engine
Effective in version 10.4.0, you can run profiles to discover data domains and choose sampling options on the Spark Engine.
Data Domain Discovery on the Spark Engine
You can perform data domain discovery on the Spark engine.
Sampling Options on the Spark Engine
You can choose the following sampling options to discover data domains on the Spark engine:
Limit n
sampling option runs a profile based on the number of the rows in the data object. When you choose to discover data domains in the Hadoop environment, the Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size.
Random percentage
sampling option runs a profile on a percentage of rows in the data object.