Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

View profile results for a profile run

View profile results for a profile run

You can view the profile results for a profile run on the
Results
tab. The tab appears after you run the profile. The
Results
tab contains a header area with profile run details, filter and sort area, profile results area, details and rules area, and data preview area. The profile results area shows the profile results for all the columns and rules in summary view. When you click a column, a detailed view of the profile results for the column appears in the area.
The following image shows the areas on the
Results
tab:
The image shows the results summary page.
  1. Header
  2. Filter and sort
  3. Profile results
  4. Details and Rules
  5. Data Preview
You can also open a profile from the
Explore
page in
Data Quality
and perform the following:
  • Edit a profile
  • Run a profile
  • View profile results
  • Create and run queries on the source object
  • Drill down on the profile results

Header

The header area shows the profile run details, which include the profile name, run number, number of columns and rules in the profile run, number of rows in the profile run, and run timestamp. The header area also displays a warning icon if the profile job runs with a warning. To view the job that ran with a warning, hover over the warning icon, and then click
View
.

Filter and sort

The following table lists the filter and sort options:
Option
Description
View
Shows the following options:
  • Columns and Rules. View the results for all the columns and rules in the profile run.
  • Columns. View the results for the columns in the profile run.
  • Rules. View the results for the rules in the profile run.
With
Shows the following options:
  • All Statistics. View the complete profile results for the profile run.
  • 100% Null <number_of_rows>. View the results for the columns and rules that have only null values.
  • 100% Distinct <number_of_rows>. View the results for the columns and rules that have only distinct values.
  • 100% Constant <number_of_rows>. View the results for the columns and rules that have the same value for all the rows.
  • Conflicting Data types <number_of_rows>. View the results for the columns and rules where the documented data type and inferred data type do not match.
  • Value Frequency Outliers <number_of_rows>. View the results for the columns or rules with value frequency outliers.
  • Pattern Outliers <number_of_rows>. View the results for the columns or rules with pattern outliers.
Choose a filter in the
With
option after you choose a filter in the
View
option.
Sort
Choose a column statistic to sort the results in ascending or descending order.
Filter
To filter the results, you can perform one or both of the following actions:
  • Add a column and enter a valid value. Add more columns with valid values as necessary.
  • Add a column statistic and enter a valid value. Add more column statistics with valid values as necessary.
Find
Enter a keyword to view the relevant search results.
Menu
Choose Comfortable, Cozy, or Compact to adjust the row width in the profile results area.

Profile results: summary view

When you open a
data profiling
task or choose a profile run, the summary view of the profile results appears. The summary view shows all the columns and rules and their statistics in the profile run.
The following image shows the summary view of profile results for columns and rules and the results are sorted by minimum value:
The image shows a sample summary results area in the summary page. The image shows the profile results for columns and rules and the results are sorted by minimum value.
You can view the columns and rules area in collapsible sections. You can view the value distribution, number and percentage of null, distinct, and non-distinct values, number of patterns, percentage of top pattern, maximum value and length, and minimum value and length in the column or rule.
You can sort the columns and rules based on one of the statistics. To sort the columns and rules, click one of the statistics. For example, if you want to view the maximum value in ascending order, click
Maximum Value
. The columns are sorted in ascending order of maximum values.

Profile results: detailed view

When you click a column in the summary view, the detailed view of the profile results for the column appears. The area shows the column values in a graphical mode. The null values appear as red vertical bars.
The following image shows the detailed view of the profile results area:
The image shows a sample detailed view in the summary page.
  1. Column or rule output name
  2. Number of distinct values, which includes non-unique values and unique values
  3. Sort By
  4. Bar chart
  5. Detailed chart
  6. Value distribution table
The following table lists the properties in the detailed view:
Property
Description
Column <column_name>
Rule <rule_output_name>
Shows the column name or rule output name.
Back to Summary
Click the button to go back to the summary view of profile results.
<total_number> distinct values (<number_of_non-unique_values>, <number_of_unique_values>
Shows the total number of distinct values in the column or rule. This property also shows the number of non-unique and unique values, with the color legend, in the column or rule.
Sort By
You can sort the value frequency distribution based on the date, integer, and decimal data types.
Choose
Frequency
or
Value
, and then choose
Ascending
or
Descending
to sort the value frequency distribution as required.
Bar chart
Shows the values as a vertical bar chart. You can view a maximum of 16,000 values in the upper area. You can slide the slider over the values in the upper area. The lower area displays the values in the slider. The outlier values appear with an orange underline.
Detailed chart
Shows the values in the slider in the upper area. By default, 50 values appear in the lower area. You can choose to view 75 or 100 values at a time. The outlier values appear with an orange underline.
Value distribution table
Shows the following statistics in a tabular format:
  • #. Row or field number in the source object.
  • Value. List of values in the column.
  • Frequency. Number of times the value appears in the column, expressed as a number.
  • Percentage. Value percentage in the column.
  • Length. Length of the column value.
The outlier values appear with a vertical bar.
By default, you can view 500 values in the detailed view. To increase or decrease the number of the values that you can view, configure the
Maximum Number of Value Frequency Pairs
option on the
Schedules
page and then run the profile.
To view the drilldown results for a value, perform the following steps:
  1. Select a value in the detailed view.
    The value appears as a filter in the
    Data Preview
    area.
  2. Click
    Apply.
    The drill down results for the value appears in the
    Data Preview
    area.

Details and Rules

When you select a column or rule in the profile results area, the
Details
tab shows the trend of values across multiple profile runs, documented and inferred data types, inferred patterns, and most frequent values for the selected column. If the column has a numeric documented data type, the
Numeric Column Statistics
section also appears for the column. The
Rules
area shows the rules associated with the column in the profile run.
The following image shows the
Details and Rules
area:
The image shows the details and rules section in the summary page.
The following table lists the sections and statistics that appear in the
Details and Rules
area:
Section
Description
Trend
Trend chart for percentage change in null, distinct, and non-distinct values. The trend chart shows the change for a maximum of 10 profile runs in a line chart.
The chart displays the trend based on the profile run you have selected.
For example, consider that there are 20 profile runs, and you are viewing the tenth profile run. In this case, the trend appears for five profile runs before the tenth profile run and four runs after the tenth profile run.
Data Types <number_of_inferred_data_type>
Shows the documented data type for the column in the data source. The section also shows the inferred data type, frequency percentage in which it appears in the column or rule, and a horizontal bar chart which is a virtual representation of data type distribution. Hover over the bar chart to view the number of rows that has the inferred data type.
Select a data type to drill down and view the drilldown results in the
Data Preview
area.
Patterns <number_of_inferred_patterns>
Shows the inferred pattern, frequency percentage in which it appears in the column or rule, and a horizontal bar chart which is a virtual representation of pattern distribution. Hover over the bar chart to view the number of rows that has the inferred data type.
Select a pattern to drill down and view the drilldown results in the
Data Preview
area.
Most Frequent Values
Shows the top five values that appear frequently in the column.
Numeric Column Statistics
Shows the following statistics for columns with numeric documented data type:
  • Average. Displays the average of the values for the column.
  • Sum. Displays the sum of all the values in the column.
  • Standard Deviation. Displays the standard deviation or variability between column values for all values of the column.
  • #Zero. Number of rows that contain the value 0 in the column or rule.
  • %Zero. Percentage of rows that contain the value 0 in the column or rule.
Rules
Shows the associated rules for the column and the rule details.

Data Preview

When you open a profile, the
Data Preview
area shows a maximum of 10 rows in the profile run results. When you select a column in the summary view of profile results, the column is highlighted in the area.
To view the drilldown results in the
Data Preview
area, perform one of the following actions:
  • Choose a value in the detailed results area.
  • Choose a pattern or data type in the
    Details and Rules
    area.
After you choose a value, pattern, or data type, it appears as a filter in the
Data Preview
area. Continue to add statistics or values if required. Click
Apply
to view the filtered drilldown results. Optionally, if you want to change the selected data type, pattern, or value, click the drop-down list to select the required statistics or values.
Data Profiling
creates and runs a subtask when you click
Apply
after you add or change a statistic or value.
For example, you are a data analyst and you want to view duplicate data for
SSN
in the Customer table. To accomplish this task, you perform the following actions:
  1. Create a
    data profiling
    task for the Customer table.
  2. Run the profile.
  3. In the profiling results, click the pattern for
    SSN
    which is
    999-99-9999
    .
The
Data Preview
area shows all the rows with the pattern
999-99-9999
.

0 COMMENTS

We’d like to hear from you!