Table of Contents

Search

  1. Preface
  2. Data Profiling
  3. Profiles
  4. Profile results
  5. Tuning data profiling task performance
  6. Troubleshooting

Data Profiling

Data Profiling

Add rules to the profile

Add rules to the profile

You can add rule specification, cleanse, parse, and verifier assets as rules to a profile. You create these assets in
Data Quality
. You can add a
Data Quality
asset as a rule if you have Read permission on the asset. You can also profile passive mapplets, which may or may not have
Data Quality
assets. Profiling will calculate the statistics on all the output ports of the mapplet, including value frequencies.
You can add one or more rules for a data profiling task. You can also run a profile without a rule.
Data Profiling
displays column statistics and rule results in collapsible sections in the results area. The results for each rule output appear in a separate row.
In
Data Quality
, when you create rule specification, cleanse, parse, or verifier assets, you configure inputs, rule logic, and outputs for the asset. When you add the asset as a rule in
Data Profiling
, the input appears as input column and the output appears as rule output. You can add single input, single output and multiple input, single output rules to profiles. When you add a rule to the profile, you assign a source column to the input column. When you run the profile,
Data Profiling
generates statistics based on the rule logic. The
Results
tab shows the rule output statistics in a separate row.
For example, a rule specification 'Validity' has an input called in_value, a rule logic, and an output called out_validity. You want to perform an analysis on a source column called 'customer-national_ID' in the Customer table. To accomplish this task, you perform the following steps:
  1. On the
    Rules
    tab, you click Add to add a rule to the profile.
  2. In the
    Add Rule
    , you select the 'Validity' rule.
  3. In the
    Rule Settings
    dialog box, you select the column 'customer-national_ID' as the input column.
    Data Profiling
    assigns the selected column to input 'in_value'.
  4. You run the profile.
  5. Data Profiling generates the rule statistics based on the rule logic.
  6. On the
    Results
    tab, the rule statistics appear in the 'out_validity' row.
When you add a single input rule, you can assign multiple columns to it.
Data Profiling
replicates the rule for each column. When you add a multiple input rule to a profile, you can add a column for each input in the rule.
Data Profiling
displays results for each selected column in a separate row.
You can add the following
Data Quality
assets as rules to a profile:

Rule specification

Use this asset to define a business rule with a set of conditions that you can use to evaluate your data. You can add rule specifications that have a single output.
A rule specification can also contain a single passive mapplet or nested passive mapplets. You can use mapplets that contain passive transformations in a rule specification. You can use the following assets in a mapplet:
  • Parse
  • Cleanse
  • Labeler
  • Rule specification
  • Verifier
  • Expression
  • Java
  • Mapplet that contains passive transformations
For more information about using mapplets in rule specifications, see
Rule specification assets
in the
Data Quality
documentation.
For example, you are a sales analyst and you want to analyze the retail sales in the Sales table.
  1. In
    Data Quality
    , you perform the following steps:
    1. Create a rule specification named Reg_pyr.
    2. Add Region and SalesYear as the inputs.
    3. Create the rule logic and test it.
    4. Save the rule specification.
  2. In
    Data Profiling
    , you perform the following steps:
    1. Create a profile on the Sales table.
    2. Add Reg_pyr rule to the profile and choose Region and SalesYear source columns for the rule.
    3. Save and run the profile.
    4. View the results on the
      Results
      tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.

Cleanse

Use this asset as a rule to standardize the appearance of your data, replace incorrect values in your data, and remove unwanted values from your data.
For example, you are a data analyst and you want to convert the FirstName and LastName columns in the Customer table to title case for better readability. To accomplish this task, you can perform for the following steps:
  1. In
    Data Quality
    , you perform the following steps:
    1. Create a cleanse asset named FN_SenC.
    2. Add a step sequence and choose
      Title Case
      as casing style.
    3. Save the asset.
    4. Test the asset with sample data.
  2. In
    Data Profiling
    , you perform the following steps:
    1. Create a profile on the Customer table.
    2. Add FN_SenC rule to the profile and choose FirstName and LastName columns for the rule.
    3. Save and run the profile.
    4. View the results on the
      Results
      tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.

Verifier

Use this asset as a rule to measure and enhance the quality of your postal address data. You can add a Verifier asset in the
Verification only
mode to a profile.
For example, you are a data analyst and the marketing department wants to send new product brochures to potential customers in California state. They want to evaluate the accuracy and deliverability of the address records in the Leads table before they send the brochures. To accomplish this task, you perform the following steps:
  1. In
    Data Quality
    , you perform the following steps:
    1. Create a verifier asset named Cal_addr.
    2. Select appropriate address model for the input address structure and specify the input and output fields.
    3. In the Process tab properties, choose
      Verification only
      as the verification mode.
    4. Save the asset.
  2. In
    Data Profiling
    , you perform the following steps:
    1. Create a profile on the Leads table.
    2. Add Cal_addr rule to the profile and choose Address1 and Address2 columns for the rule.
    3. Save and run the profile.
    4. View the results on the
      Results
      tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.

Parse

Use a parse asset to improve the structure of your data. A parse asset defines a set of operations that can identify discrete values in an input field and write the values to appropriate output fields.
For example, you are a data analyst and you need to find out information about potential customers from the list of email addresses. The data source includes emails of people who contacted your organization. You need to share the results with the sales department so that they can pursue the new customers. To accomplish this task, you perform the following steps:
  1. In
    Data Quality
    , you perform the following steps:
    1. Create a parse asset named Email_parse.
    2. Add the
      Regular Expression
      parse step.
    3. Select the
      Parse Email
      built-in regular expression.
    4. Enter
      Name
      ,
      Company
      , and
      Domain
      as the output fields.
    5. Save the asset.
  2. In
    Data Profiling
    , you perform the following steps:
    1. Create a profile on the customer details table.
    2. Add Email_parse rule to the profile and choose Email_ID source column for the rule.
    3. Save and run the profile.
    4. View the results on the
      Results
      tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
You cannot add rules if the rule input or rule output name exceeds 4000 bytes. When you open a
Data Quality
asset that is associated to a profile, the
Used by
section on the
Asset References
tab shows the profile name.
For information about creating a rule specification, cleanse, verifier, or parse asset, see
Data Quality
in
Data Quality
help.

Mapplet

Use a mapplet to transform the source data. You can add passive mapplets as rules to a profile. A mapplet is reusable transformation logic that you can use to transform source data before it is loaded into the target.
For example, you are a data analyst and you want to concatenate the first name and last name of customers in the Customer table to get the full name of customers. To accomplish this task, perform the following steps:
  1. In
    Data Integration
    , you perform the following steps:
    1. Create a mapplet asset named Concatenate_mapplet.
    2. Add FirstName and LastName as the mapplet inputs.
    3. Add expression transformation to the mapplet.
    4. Add FullName as the mapplet output.
    5. Validate and save the mapplet.
  2. In
    Data Profiling
    , you perform the following steps:
    1. Create a profile on the Customer table.
    2. Add Concatenate_mapplet rule to the profile and choose FirstName and LastName source columns for the rule.
    3. Save and run the profile.
    4. View the results on the
      Results
      tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
For information about creating mapplets, see Mapplets in
Data Integration
.
  • You cannot add active mapplets to a profile.
  • Mapplets work only for profiles on native engine and do not work for profiles on spark engine.
  • Mapplets are of three types:
    Data Integration
    , PowerCenter and SAP. Only
    Data Integration
    and PowerCenter mapplets can be used in
    Data Profiling
    .
  • Mapplets that support parameters or require connection for lookups are not supported in
    Data Profiling
    .
  • You can use the following list of assets in a mapplet:
    • Parse
    • Cleanse
    • Labeler
    • Rule specification
    • Verifier
    • Expression
    • Java
    • Nested mapplet
  • There are other transformations available in
    Data Integration
    that you can use in a mapplet. However, these transformations are not used in
    Data Profiling
    as they make the mapplet active. For information about other transformations, see Transformations in
    Data Integration
    .

0 COMMENTS

We’d like to hear from you!