Data Profiling

Back Next

Advanced options

You can configure the advanced options to detect outliers, infer the date and time, and infer other profile-related parameters.

The following table lists the advanced options that you can configure for a profile:

Option	Description
Maximum Number of Value Frequency Pairs	Number of column values with the highest frequencies appear in the profile results. Default is 500. For example, if you set the value to 100, only the top 100 values appear in the profile results. If you do not want to save the value frequency information of a profile in the profiling warehouse, set the value to 0.
Maximum Number of Patterns	Number of patterns with the maximum number of occurrences appear in the profile results. The rest of the patterns appear under the Patterns Others category on the Results area. Default is 10. For example, if you set the value to 3, the top 3 patterns appear with their statistics, and the rest of the patterns are consolidated under the Others category.
Pattern Threshold Percentage	Maximum percentage of values used to derive a pattern in the profile results. Default is 5. For example, when you set the value to 4, the patterns that are 4% and higher appear individually with their statistics and the rest of the patterns are consolidated under the Others category.
Infer Date and Time	Infers the date and time for a column of date or time data type. Default is Yes.
Detect Outliers	Detects pattern and value frequency outliers in the source object. Default is Yes.
Minimum Number of Rows for Split Process per Column	If the source object contains more rows than the minimum number of rows that you enter here, Data Profiling uses one subtask for each source column when the profile is run. Default is 100,000,000.
Maximum Number of Columns per Mapping	Number of columns for each mapping when the number of source rows is fewer than the Minimum Number of Rows for Split Processing per Column value. Default is 50.
Maximum Memory per Mapping*	Maximum amount of memory that you want to allocate for each mapping. Default is 512 MB.
Default buffer block size	Size of buffer blocks used to move data blocks from sources to targets. Default is Auto. Enter one of the following options: Auto. Uses automatic memory settings. When you use Auto, configure Maximum Memory per Mapping . A numeric value. Enter the numeric value that you want to use. The default unit of measure is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For example, 512MB.
DTM Buffer Size	Amount of memory allocated to the task from the DTM process. Default is Auto. By default, a minimum of 12 MB is allocated to the buffer at run time. Use one of the following options: Auto. Uses automatic memory settings. When you use Auto, configure Maximum Memory per Mapping . A numeric value. Enter the numeric value that you want to use. The default unit of measure is bytes. Append KB, MB, or GB to the value to specify a different unit of measure. For example, 512MB.
Line Sequential Buffer Length	Number of bytes that the task reads for each row in a flat file source. Default is 1024.
* The mapping is a type of subtask. Data Profiling creates and runs for a data profiling task to process the data concurrently.

The default values for the advanced options have been derived to provide the best performance. However, you can configure the values based on your requirements. To optimize the

data profiling

task performance, see Tuning data profiling task performance.

You can configure the following advanced options for a profile with Avro or Parquet source objects:

Maximum Number of Value Frequency Pairs

Maximum patterns

Threshold percentage for patterns

Detect outliers

Schedule and advanced options

Download Guide

Watch

Comments

Cloud Data Profiling Homepage