Data Engineering Integration
- Data Engineering Integration 10.4.1
- All Products
Sampling Option
| Description
|
---|---|
All rows
| Runs a profile on all the rows in the data object.
Supported on Native, Blaze, and Spark run-time environment.
|
Sample first <number> rows
| Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows.
Supported on Native and Blaze run-time environment.
|
Random sample <number> rows
| Runs a profile on a randomly picked number of the rows in the data object. You can choose a maximum of 2,147,483,647 rows.
Supported on Native and Blaze run-time environment.
|
Random sample (auto)
| Runs a profile on the sample rows computed on the basis of the number of rows in the data object.
Supported on Native and Blaze run-time environment.
|
Limit n <number> rows
| Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option.
Supported on Spark run-time environment.
|
Random percentage
| Runs a profile on a percentage of rows in the data object.
Supported on Spark run-time environment.
|
Exclude approved data types and data domains from the data type and data domain inference in the subsequent profile run
| Excludes the approved data type or data domain from data type and data domain inference from the next profile run.
|