A Parse transformation creates the output fields that the parse asset specifies. The type and number of output fields depends on the parsing operations that the user configures in the parse asset. View the output fields on the
Output Fields
tab of the
Properties
panel.
The Parse transformation can create some or all of the following types of output field:
Parsed data fields
Contain data values that meet the parsing criteria that the asset defines.
Overflow fields
Contain data values that meet the parsing criteria but for which a corresponding output field is not available. The Parse transformation writes a value to an overflow field when all appropriate output fields for the value are already populated.
Unparsed field
A field that contains any value that does not meet the parsing criteria that the asset defines.
Rule and guidelines for Parse transformation output fields
Consider the following rules and guidelines when you review the output fields on the transformation:
The type and number of output fields depends on the parsing operations that the user configures in the parse asset.
When the asset specifies a regular expression or a dictionary, the transformation creates one or more output fields for the data that each regular expression or dictionary parses successfully.
The user who configures the asset determines the number of output fields for each regular expression or dictionary operation. Each regular expression or dictionary operation is called a
step
in the asset configuration.
When the asset specifies pattern-based parsing, the transformation creates a range of output fields that represent the types of information that the pattern logic might find.
A pattern-based parsing operation can generate output fields for the following types of information:
Person names, such as first names, family names, name prefixes, and name suffixes.
Derived information, such as gender, formal greetings, and informal greetings.
Label values that represent the pattern that the parsing operation identified in the input data row. The
Data Quality
user can use the labels to enhance the pattern logic in the asset.
The asset logic determines the number of output fields for parsed data.
When the asset specifies a regular expression or a dictionary, the transformation may create a single overflow field for all overflow data. Or, the transformation may create an overflow field for each regular expression or dictionary that the asset defines. The user can update the asset properties to determine the policy for overflow fields.
When the asset specifies a pattern parsing operation, the transformation may or may not create a single overflow field. The presence or absence of the overflow field depends on the locale that the asset specifies for the input data. For example, the Parse transformation creates an overflow field for pattern parsing operations when the asset specifies the locale as Portugal or Brazil. The
Data Quality
user sets the locale.
The transformation creates a single field for all unparsed data when the asset specifies a regular expression or a dictionary.
The transformation may or may not create an unparsed data field when the asset specifies a pattern parsing operation. The presence or absence of the unparsed data field in pattern-based parsing depends on the locale that the asset specifies for the input data.