Rules and guidelines for the Structure Parser transformation
Rules and guidelines for the Structure Parser transformation
Consider the following rules and guidelines when you use a Structure Parser transformation:
When you open a mapping that contains a Structure Parser transformation in the Mapping Designer, and the associated
intelligent structure model
changed after it was associated with the transformation, a message appears in the Mapping Designer. Click the link that is provided in the message to update the model so that it complies with the mapping.
If you update an
intelligent structure model
that was created with an older version of
Intelligent Structure Discovery
, where all output fields were assigned a string data type, the Structure Parser transformation might change field data types during the update. If, for any of the affected fields, the downstream transformation or the target expects a string, edit the model to change the data type back to string. Then, in the Mapping Designer, click the provided link to update the model in the transformation again.
For more information about editing an
intelligent structure model
, see
Components
.
The Structure Parser transformation can parse recursive elements in an XSD input file if all of the following conditions are true:
The recursive element isn't nested inside a repeating element.
The recursive element always uses the same name element.
The Structure Parser transformation isn't used in a mapping in advanced mode.
If the transformation can't parse a recursive element, it outputs the data in a single output group. The transformation can parse recursive elements in mappings created after the October 2023 release or when you reupload the schema file after upgrading to the October 2023 release.
When you use a Structure Parser transformation in a mapplet, to prevent runtime errors, be sure that the combined Mapplet transformation name doesn't exceed 80 characters. For more information, see
Mapplet transformation names.
Rules and guidelines for mappings in advanced mode
Consider the following rules and guidelines when you use the Structure Parser transformation in a mapping in advanced mode:
In advanced mode, a Structure Parser transformation can process a JSON object as input, but not a JSON array.
By default in advanced mode, the schema of the source data must match exactly the schema of the intelligent structure model associated with the transformation, and the transformation can't process pass-through fields. To process pass-through fields, set the following Spark custom property in the Spark session properties for the mapping task:
Spark.MSPEnableUnassigedData=true
When you set this property, the output group contains an array called
UnassignedData
that contains the data that was not identified by the intelligent structure.
Rules and guidelines for output types
Consider the following rules and guidelines when you select the output type for a Structure Parser transformation:
Before you use the Parquet output type, set the
HADOOP_HOME
and
hadoop.home.dir
environment variables.
When you use an AVRO output type, the transformation doesn't generate output for input fields with identical names.
The transformation creates ORC output based on the local date and time. Processing the same input in different locations or environments might result in output with different times and time formats.
We recommend that you use a binary output type to create large XML and JSON files.