Your organization needs to convert a large volume of customer data in a flat file to struct data and write it to an Avro file. The input file contains customer details such as name, age, and phone numbers. If the customer name is null in the input file, you do not want to add customer details to the output file.
You can develop a mapping with a Java transformation to define the transformation functionality. In the Hadoop environment, run the mapping on the Spark engine to transform the data and write the struct data to an Avro file.
Create a mapping and configure the following transformations:
Read transformation that reads customer information from a flat file source
Java transformation as an active transformation that converts flat data to struct data and removes inconsistent data
Write transformation that writes the struct data to an Avro file
The following image shows the mapping with a Read transformation, a Java transformation, and a Write transformation.
On the type definition library tab of the mapping editor, create a complex data type definition Customer. The complex data type definition represents the schema of the struct data. Rename the type definition library to CustomerInfo. Add the following elements to the complex data type definition:
name of type string
age of type integer
phones of type array with string elements
The following image shows the complex data type definition in the type definition library:
In the Java transformation, add a struct output port and specify the type configuration of the port to reference the complex data type definition that you created. The Java transformation generates a class Customer with setters and getters to read and set the member fields. The class contains the following member fields:
_name
_age
_phones
The following image shows the class created for the struct port in the
Full Code
tab of the
Java
view:
The Java data type for the struct port uses the name of the type definition library and complex data type definition. The following image shows the Java data type name CustomerInfo.Customer for the cust field in the generated code:
In the
Java
view of the Java transformation, import any third-party, built-in, or custom Java packages that the transformation requires. Write and compile the Java code to convert the flat data into struct data and to remove the customer row if the customer name is null.
The following image shows the code in the
On Input
tab:
Validate the mapping and run the mapping on the Spark engine to write the transformed data to the Avro file output.