Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Statistics extracted from source objects

Statistics extracted from source objects

After you run a profile, the profile extracts column statistics, patterns, data types, value frequencies, and outliers for columns and rules.

Column Statistics

The following table lists the column statistics that you can view after you run a profile:
Property
Description
Columns
Rules
Columns and rules in the profile run appear in collapsible sections. You can collapse or expand the section to view the columns and their statistics.
When you click a metric for a column, the metric is highlighted in the
Data Preview
area. When you click a column name, the detailed view for the column appears.
Value Distribution
Distribution of null values, distinct values, and non-distinct values in a horizontal bar chart for a column or rule.
% Null
Percentage of rows with null values in the column or rule.
# Null
Number of null values in the column or rule.
% Distinct
Percentage of rows with distinct values in the column or rule.
# Distinct
Number of distinct values in the column or rule.
% Non-distinct
Percentage of rows with non-distinct values in the column or rule.
# Non-distinct
Number of non-distinct values in the column or rule.
# Patterns
Number of patterns in the column or rule.
% of Top Pattern
Percentage of rows with the most frequent pattern in the column or rule.
Maximum Length
Length of the longest value in the column.
Maximum Value
Highest value in the column.
Minimum Length
Length of the shortest value in the column.
Minimum Value
Lowest value in the column.
% Blank
Has no value in the column or rule.
# Blank
Percentage of rows that have no value in the column or rule.

Patterns

You can view inferred patterns after you run a profile.
The following table describes the pattern characters and what they represent:
Character
Description
'B' or 'b' or ' '
Represents a blank space.
'C' or 'c'
Represents any character.
'L' or 'l'
Represents any lowercase alphabetic character.
‘T’ or ‘t’
Represents a tab.
‘U’ or ‘u’
Represents any uppercase alphabetic character.
9
Represents any numeric character.
Data Profiling
displays up to three characters separately in the "9" format. The tool displays more than three characters as a value within parentheses. For example, the format "9(8)" represents a numeric value with eight digits.
'X' or 'x'
Represents any alphabetic character.
Data Profiling
displays up to three characters separately in the "X" format. The tool displays more than three characters as a value within parentheses. For example, the format "X(6)" might represent the value "Boston."
The pattern character X is not case sensitive and might represent uppercase characters or lowercase characters from the source data.
'P' or 'p'
Represents "(", the opening parenthesis.
'Q' or 'q'
Represents ")", the closing parenthesis.
Column patterns can also include special characters. For example, ~, [, ], =, -, ?, =, {, *, -, >, <, and $.

Data Types

You can view the documented data type and inferred data types after you run a profile.

Value frequencies

You can view value frequencies for each column after you run a profile in summary view and detailed view of profile results.

Outliers

An outlier is a pattern, value, or frequency for a column in the profile results that does not fall within an expected range of values.
The
Detect Outliers
advanced option on the
Schedule
tab is enabled by default. During profile run, the profile identifies the columns with value frequency outliers and patterns outliers in the source object. The value frequency outliers are detected based on the values or frequencies in the column. The pattern outliers are detected based on the patterns in the column.
You can view the outliers in the source object in the following areas:
Profile results: summary view
In summary view, you can view the columns that contain outlier values. To view the columns with outliers, choose
Value Frequency Outliers
or
Pattern Outliers
filters in the results area.
The following image shows an example of the
Value Frequency Outliers
and
Pattern Outliers
filters in the results area:
The image shows the value frequency outliers and pattern outliers filters that you can choose to view the outliers.
Profile results: detailed view
In the detailed view, you can view the outlier values in a column. The outlier values appear with an orange underline in the bar chart and a orange vertical bar in the value distribution table.
The following sample image shows the outlier values in the results area:
The image shows the outlier values in the bar chart and value distribution table.

0 COMMENTS

We’d like to hear from you!