When you run a mapping on the Spark engine, you can read data from and write data to Avro, ORC, and Parquet files that are partitioned based on directories.
You must import a directory that contains only partition folders and select the source type as
Directory
in the advanced read properties.
Importing a data object with partition files
Perform the following steps to import a data object to read or write from partition files:
Select a project or folder in the
Object Explorer
view.
Click
File
New
Data Object
.
Select
Microsoft Azure Data Lake Storage Gen2 Data Object
and click
Next
.
The
Microsoft Azure Data Lake Storage Gen2 Data Object
dialog box appears.
Click
Browse
and select the target project or folder.
In the
Resource Format
list, select Avro, Parquet, or ORC.
Click
Add
to add a resource to the data object.
The
Add Resource
dialog box appears. You can use the
File Type
column to distinguish between a directory and a file.
Select the check box for a directory. Click
OK
.
Click
Finish
.
The partitioned columns are displayed with the order of partitioning in the data object
Overview
tab.
Create target with partition files
Perform the following steps to create target with partition files:
Select a project or folder in the
Object Explorer
view.
Select a source or a transformation in the mapping.
Right-click the Source transformation and select
Create Target
.
The
Create Target
dialog box appears.
Select
Others
and then select
Microsoft Azure Data Lake Storage Gen2
data object from the list in the
Data Object Type
section.
Click
OK
.
The
New Microsoft Azure Data Lake Storage Gen2 Data Object
dialog box appears.
Enter a name for the data object.
Enter the partition fields.
The following image shows the
Edit partition fields
dialog box:
You can change the partition order using the up and down arrows.
Click
Finish
.
The partitioned columns are displayed with the order of partitioning in the data object