Data Discovery Guide

Back Next

Column Profile Sampling Options for Enterprise Discovery

The sampling options determine whether the Developer tool runs a column profile on all rows of the data sources or limited number of rows.

The following table describes the column profile sampling options that you configure for enterprise discovery:

Option	Description
All Rows	Runs a profile on all the rows in the data object. Supported on Native, Blaze, and Spark run-time environment.
Sample First <number> rows	Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows. Supported on Native and Blaze run-time environment.
Limit N <number> rows	Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option. Supported on Spark run-time environment.
Random Percentage	Runs a profile on a percentage of rows in the data object. Supported on Spark run-time environment.
Exclude data type inference for columns with an approved data type	Excludes columns with an approved data type from the data type inference of the column profile run.