Create a connection to access data in files that are stored in the relevant system. You can create the following types of connections that will work with the data objects that can incorporate an intelligent structure:
Hadoop Distributed File System
Amazon S3
Microsoft Azure Blob
Create a data object with an
Intelligent structure model
to read and parse source data.
Create a data object with an
Intelligent structure model
to represent the files stored as sources. You can create the following types of data objects with an
Intelligent structure model
:
complex file
Amazon S3
Microsoft Azure Blob
Configure the data object properties.
In the read data object operation, enable the column file properties to project columns in the files as complex data types.
Create a data object to write data to a target.
Create a data object to write the data to target storage.
Configure the data object properties.
Create a mapping and add mapping objects.
Create a mapping.
Add a Read transformation based on the data object with the
Intelligent structure model
.
Based on the mapping logic, add other transformations that are supported on the Spark engine. Link the ports and configure the transformation properties based on the mapping logic.
Add a Write transformation based on the data object that passes the data to the target storage or output. Link the ports and configure the transformation properties based on the mapping logic.
Configure the mapping to run on the Spark engine.
Configure the following mapping run-time properties:
Select Hadoop as the validation environment and Spark as the engine.
Select Hadoop as the execution environment and select a Hadoop connection.
Validate and run the mapping on the Spark engine.
Validate the mapping and fix any errors.
Optionally, view the Spark engine execution plan to debug the logic.