A complex file data object is a representation of a file in the Hadoop file system. Create a complex file data object to read or write structured, semi-structured, and unstructured data to HDFS.
You can read files from the local system or HDFS. Similarly, you can write files to the local system or HDFS. To read large volumes of data, you can connect a complex file source to read data from a directory of files that have the same format and properties. You can read and write compressed binary files.
When you read or write the industry-standard file formats, you may or may not use the Data Processor transformation based on the structure of the file and the engine you select to run the mapping.
You can use a complex file data object with an intelligent structure model to read and parse semi-structured or structured data from text, CSV, XML, or JSON files, as well as PDF forms, Microsoft Word tables, or Microsoft Excel. The output of the complex file data object is primitive and complex elements. You do not need to use a Data Processor transformation with a complex file data object that uses an intelligent structure model. The Data Integration Service can directly read intelligent structure model resources to HDFS or the local file system.
When you use a binary complex file data object as a source, you can use a Data Processor transformation to parse the file. The output of the binary complex file data object is a binary stream. Similarly, when you write binary data to a complex file, you must use a Data Processor transformation to convert the source data into a binary format. You can then use the binary stream to write data to the binary complex file.
When you create a complex file data object, a read and write operation is created. You can use the complex file data object read operation as a source in mappings and mapplets. You can use the complex file data object write operation as a target in mappings and mapplets. You can select the mapping environment and run the mappings in a native or Hadoop run-time environment.