Informatica Data Quality
- Informatica Data Quality 10.4.0
- All Products
Option
| Description
|
---|---|
Enable column profiling
| Runs a column profile as part of enterprise discovery.
|
Exclude approved data types and data domains from the data type and data domain inference in the subsequent profile runs
| Excludes the approved data type or data domain from data type and data domain inference from the next profile run.
|
Option
| Description
|
---|---|
Native
| The Analyst tool submits the profile jobs to the Profiling Service Module. The Profiling Service Module then breaks down the profile jobs into a set of mappings. The Data Integration Service runs these mappings and writes the profile results to the profiling warehouse.
|
Blaze
| The Data Integration Service pushes the profile logic to the Blaze engine on the Hadoop cluster to run profiles.
|
Spark
| The Data Integration Service pushes the profile logic to the Spark engine on the Hadoop cluster to run profiles.
|
Option
| Description
|
---|---|
All Rows
| Runs a column profile on all rows in the data source.
Supported on Native, Blaze, and Spark run-time environment.
|
First <number> Rows
| Runs a profile on the sample rows from the beginning of the rows in the data object. You can choose a maximum of 2,147,483,647 rows.
Supported on Native and Blaze run-time environment.
|
Limit n <number> Rows
| Runs a profile based on the number of rows in the data object. When you choose to run a profile in the Hadoop validation environment, Spark engine collects samples from multiple partitions of the data object and pushes the samples to a single node to compute sample size. The Limit n sampling option supports Oracle, SQL Server, and DB2 databases. You cannot apply the Advanced filter with the Limit n sampling option. You can select a maximum of 2,147,483,647 rows.
Supported on Spark run-time environment.
|
Random percentage
| Runs a profile on a percentage of rows in the data object.
Supported on Spark run-time environment.
|