Creating a Complex File Data Object from an Avro or Parquet Data Source
Creating a Complex File Data Object from an Avro or Parquet Data Source
You can create a complex file data object from an Avro or Parquet data source with
File
or
Connection
as the access type. You can create a column profile on the data object.
In the
Object Explorer
view, select a project.
Click
File
New
Data Object
.
The
New
dialog box appears.
Select
Physical Data Objects
Complex File Data Object
and click
Next
.
The
New Complex File Data Object
dialog box appears.
Enter a name for the data object.
You can choose the access type as
Connection
or
File
.
If you choose the Access Type as
Connection
, perform the following steps:
Click
Browse
to choose an HDFS connection.
In the
Choose Connection
dialog box, choose a data source, and click
OK
.
In the
New Complex File Data Object
dialog box, click
Finish
.
The data object appears in the project folder.
If you choose the Access Type as
File
and the Resource Format as
Binary
, perform the following steps:
Click
Browse
to choose an Avro or Parquet file on the local machine.
In the
New Complex File Data Object
dialog box, click
Finish
.
The data object appears in the project folder.
Select the data object in the project folder and click the
Data Object Operations
view.
In the
Data Object Operations
view, click
Read
Advanced
tab.
In the
Advanced
tab, enter the file path of the data source on the Linux or Windows machine in the
File path
field.
Enter the File Format as
Custom Input
.
Enter
com.informatica.avro.AvroToXML
in the
Input Format
field for Avro data sources, and enter
com.informatica.parquet.ParquetToXML
in the
Input Format
field for Parquet data sources. When you add the input format, the Data Processor Transformation processes and transforms the data sources in Avro or Parquet format to a data source in XML format at runtime.
If you choose the Access Type as
File
and the Resource Format as
Avro
or
Parquet
, perform the following steps:
Click
Browse
to choose an Avro or Parquet file in the local machine.
In the
New Complex File Data Object
dialog box, click
Finish
.
The data object appears in the project folder.
After you create the data object, navigate to
Data Object Operations
Read
Advanced
tab, and verify whether the file path in the
File path
field corresponds to the data source in the Linux or Windows machine.
You can choose the Resource Format as
Avro
or
Parquet
only for flat-structured Avro and Parquet data sources.
You can choose a folder with multiple Avro or multiple Parquet files to create a data object. After you create the data object, navigate to
Data Object Operations
Read
Advanced
tab, and verify whether the file path in the
File path
field points to the folder of the data sources in the Linux or Windows machine.