When you run mappings on the Spark and Databricks Spark engines, you can read data from and write data to Avro, ORC, and Parquet files that are partitioned based on directories.
Importing a data object with partition files
Perform the following steps to import a data object to read or write from partition files:
Select a project or folder in the
Object Explorer
view.
Click
File
New
Data Object
.
Select
AmazonS3 Data Object
and click
Next
.
The
AmazonS3 Data Object
dialog box appears.
Click
Browse
next to the
Location
option and select the target project or folder.
In the
Resource Format
list, select Avro, Parquet, or ORC from the drop-down.
Click
Add
next to the
Selected Resource
option to add a resource to the data object. The
Add Resource
dialog box appears. You can use the
File Type
column to distinguish between a directory and a file.
The following image shows the Add resource dialogue box where you can select the file name and directory:
Select the check box for a directory. Click
OK
.
Click
Finish
.
The partitioned columns are displayed with the order of partitioning in the data object
Overview
tab.
The following image shows the data object overview tab:
Create target with partition files
Perform the following steps to create target with partition files:
Select a project or folder in the
Object Explorer
view.
Select a source or a transformation in the mapping.
Right-click the Source transformation and select
Create Target
.
The
Create Target
dialog box appears.
The following image shows the
Create Target
option:
Select
Others
and then select
AmazonS3
data object from the list in the
Data Object Type
section.
Click
OK
.
The
New AmazonS3 Data Object
dialog box appears.
The following image shows the
New AmazonS3 Data Object
dialog box:
Enter a name for the data object.
Enter the partition fields.
The following image shows the
Edit partition fields
dialog box:
You can change the partition order using the up and down arrows.
The following image shows the partitioned fields after changing the order:
Click
Finish
.
The partitioned columns are displayed with the order of partitioning in the data object