To perform pattern-based parsing, create a parse asset in Data Quality and select the
Pre-built
parsing mode.
Optionally, test the asset configuration. Use the test results to update the pattern logic that the asset uses.
Prerequisites
Before you perform pattern-based parsing, verify that the
CDQ_Name_Parsing_Reference_Data_Bundle
is present in the
Add-On Bundles
folder in Explorer.
Testing strategies for pattern-based parsing
You can test the asset before you use it in a mapping and after you use it in a mapping. For example, you might test the asset and update the pattern data before your first mapping run. You might then use the results of the first mapping run to update the pattern data so that the mapping parses the source data more comprehensively.
When you test the asset before you run a mapping, you add one or more patterns from the test results to the pattern-based logic in the asset. When you test the asset with the results of a mapping, you can add both the patterns and the names that the patterns identify to the pattern-based logic in the asset.
Process flow
The following steps summarize the configuration process:
If necessary, install the
CDQ_Name_Parsing_Reference_Data_Bundle
asset bundle.
Select
Pre-built
parsing mode.
Select the locale and the format of the data that the operation will read.
After you verify the locale and the data format, the asset is ready to use in a Parse transformation. You can use the steps that follow to enhance the pattern-based logic that the asset applies to your data.
You can test and enhance the pattern-based logic before you add the asset to the Parse transformation. Or, you can run a mapping that contains the transformation and use the mapping results to update the pattern-based logic.
Test a sample of your data in the parse asset. Enter the values that you want to test, or import a file that contains the values.
You can import a file and perform the test on your data in the following ways:
Import a file that contains name data, and run the test. You might perform this step to enhance the pattern logic before you add the asset to a transformation in a mapping.
Import a file that includes both names and patterns, and run the test. You might perform this step to enhance the pattern logic with the results of the mapping that you ran.
Review the results of the test.
Find any name that the test failed to parse. Copy the pattern for each name to a CSV or Microsoft Excel file. Optionally, copy the name along with the pattern.
Import the file that contains the pattern data and optionally the name data to the asset. Use the
Add User-Defined Patterns
option to import the file data.
Map the values in each pattern to appropriate fields in the user-defined pattern grid. Then, save the asset.
When you map a pattern to the appropriate fields, you train the asset to recognize names with a structure that matches the pattern.
Run the test again and review the results.
You may decide to import additional patterns to the asset in order to further improve the pattern parsing logic.
When you are satisfied with the performance of the parsing operation on your sample data, save the asset. A Data Integration user can add the asset to a Parse transformation in a mapping and run the mapping on your data.