You can add rule specification, cleanse, parse, and verifier assets as rules to a profile. You create these assets in
Data Quality
. You can add a
Data Quality
asset as a rule if you have Read permission on the asset. You can also profile passive mapplets, which may or may not have
Data Quality
assets. Profiling will calculate the statistics on all the output ports of the mapplet, including value frequencies.
You can add one or more rules for a data profiling task. You can also run a profile without a rule.
Data Profiling
displays column statistics and rule results in collapsible sections in the results area. The results for each rule output appear in a separate row.
In
Data Quality
, when you create rule specification, cleanse, parse, or verifier assets, you configure inputs, rule logic, and outputs for the asset. When you add the asset as a rule in
Data Profiling
, the input appears as input column and the output appears as rule output. You can add single input, single output and multiple input, single output rules to profiles. When you add a rule to the profile, you assign a source column to the input column. When you run the profile,
Data Profiling
generates statistics based on the rule logic. The
Results
tab shows the rule output statistics in a separate row.
For example, a rule specification 'Validity' has an input called in_value, a rule logic, and an output called out_validity. You want to perform an analysis on a source column called 'customer-national_ID' in the Customer table. To accomplish this task, you perform the following steps:
On the
Rules
tab, you click Add to add a rule to the profile.
In the
Add Rule
, you select the 'Validity' rule.
In the
Rule Settings
dialog box, you select the column 'customer-national_ID' as the input column.
Data Profiling
assigns the selected column to input 'in_value'.
You run the profile.
Data Profiling generates the rule statistics based on the rule logic.
On the
Results
tab, the rule statistics appear in the 'out_validity' row.
When you add a single input rule, you can assign multiple columns to it.
Data Profiling
replicates the rule for each column. When you add a multiple input rule to a profile, you can add a column for each input in the rule.
Data Profiling
displays results for each selected column in a separate row.
You can add the following
Data Quality
assets as rules to a profile:
Rule specification
Use this asset to define a business rule with a set of conditions that you can use to evaluate your data. You can add rule specifications that have a single output.
A rule specification can also contain a single passive mapplet or nested passive mapplets. You can use mapplets that contain passive transformations in a rule specification. You can use the following assets in a mapplet:
Parse
Cleanse
Labeler
Rule specification
Verifier
Expression
Java
Mapplet that contains passive transformations
For more information about using mapplets in rule specifications, see
Rule specification assets
in the
Data Quality
documentation.
For example, you are a sales analyst and you want to analyze the retail sales in the Sales table.
In
Data Quality
, you perform the following steps:
Create a rule specification named Reg_pyr.
Add Region and SalesYear as the inputs.
Create the rule logic and test it.
Save the rule specification.
In
Data Profiling
, you perform the following steps:
Create a profile on the Sales table.
Add Reg_pyr rule to the profile and choose Region and SalesYear source columns for the rule.
Save and run the profile.
View the results on the
Results
tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
Cleanse
Use this asset as a rule to standardize the appearance of your data, replace incorrect values in your data, and remove unwanted values from your data.
For example, you are a data analyst and you want to convert the FirstName and LastName columns in the Customer table to title case for better readability. To accomplish this task, you can perform for the following steps:
In
Data Quality
, you perform the following steps:
Create a cleanse asset named FN_SenC.
Add a step sequence and choose
Title Case
as casing style.
Save the asset.
Test the asset with sample data.
In
Data Profiling
, you perform the following steps:
Create a profile on the Customer table.
Add FN_SenC rule to the profile and choose FirstName and LastName columns for the rule.
Save and run the profile.
View the results on the
Results
tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
Verifier
Use this asset as a rule to measure and enhance the quality of your postal address data. You can add a Verifier asset in the
Verification only
mode to a profile.
For example, you are a data analyst and the marketing department wants to send new product brochures to potential customers in California state. They want to evaluate the accuracy and deliverability of the address records in the Leads table before they send the brochures. To accomplish this task, you perform the following steps:
In
Data Quality
, you perform the following steps:
Create a verifier asset named Cal_addr.
Select appropriate address model for the input address structure and specify the input and output fields.
In the Process tab properties, choose
Verification only
as the verification mode.
Save the asset.
In
Data Profiling
, you perform the following steps:
Create a profile on the Leads table.
Add Cal_addr rule to the profile and choose Address1 and Address2 columns for the rule.
Save and run the profile.
View the results on the
Results
tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
Parse
Use a parse asset to improve the structure of your data. A parse asset defines a set of operations that can identify discrete values in an input field and write the values to appropriate output fields.
For example, you are a data analyst and you need to find out information about potential customers from the list of email addresses. The data source includes emails of people who contacted your organization. You need to share the results with the sales department so that they can pursue the new customers. To accomplish this task, you perform the following steps:
In
Data Quality
, you perform the following steps:
Create a parse asset named Email_parse.
Add the
Regular Expression
parse step.
Select the
Parse Email
built-in regular expression.
Enter
Name
,
Company
, and
Domain
as the output fields.
Save the asset.
In
Data Profiling
, you perform the following steps:
Create a profile on the customer details table.
Add Email_parse rule to the profile and choose Email_ID source column for the rule.
Save and run the profile.
View the results on the
Results
tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
You cannot add rules if the rule input or rule output name exceeds 4000 bytes. When you open a
Data Quality
asset that is associated to a profile, the
Used by
section on the
Asset References
tab shows the profile name.
For information about creating a rule specification, cleanse, verifier, or parse asset, see
Data Quality
in
Data Quality
help.
Mapplet
Use a mapplet to transform the source data. You can add passive mapplets as rules to a profile. A mapplet is reusable transformation logic that you can use to transform source data before it is loaded into the target.
For example, you are a data analyst and you want to concatenate the first name and last name of customers in the Customer table to get the full name of customers. To accomplish this task, perform the following steps:
In
Data Integration
, you perform the following steps:
Create a mapplet asset named Concatenate_mapplet.
Add FirstName and LastName as the mapplet inputs.
Add expression transformation to the mapplet.
Add FullName as the mapplet output.
Validate and save the mapplet.
In
Data Profiling
, you perform the following steps:
Create a profile on the Customer table.
Add Concatenate_mapplet rule to the profile and choose FirstName and LastName source columns for the rule.
Save and run the profile.
View the results on the
Results
tab. Optionally, export the results to a Microsoft Excel file or run a query that generates the content into a delimited file for further analysis.
For information about creating mapplets, see
Mapplets in
Data Integration
.
You cannot add active mapplets to a profile.
Mapplets work only for profiles on native engine and do not work for profiles on spark engine.
Mapplets are of three types:
Data Integration
, PowerCenter and SAP. Only
Data Integration
and PowerCenter mapplets can be used in
Data Profiling
.
Mapplets that support parameters or require connection for lookups are not supported in
Data Profiling
.
You can use the following list of assets in a mapplet:
Parse
Cleanse
Labeler
Rule specification
Verifier
Expression
Java
Nested mapplet
There are other transformations available in
Data Integration
that you can use in a mapplet. However, these transformations are not used in
Data Profiling
as they make the mapplet active. For information about other transformations, see
Transformations in