Table of Contents

Search

  1. Preface
  2. Introduction to Mass Ingestion
  3. Prepare
  4. Create
  5. Deploy
  6. Run
  7. Monitor
  8. infacmd mi Command Reference

Mass Ingestion Guide

Mass Ingestion Guide

HDFS Target

HDFS Target

Configure an HDFS target to ingest source data to a flat file on HDFS.
When you configure the mass ingestion specification to ingest data to an HDFS target, you configure an HDFS connection and an ingestion directory to define the target.
If you enable incremental load in the definition of the mass ingestion specification, you must configure incremental load options for the HDFS target to select a mode to ingest the data.
The following image shows the Target page for an HDFS target:
This screenshot shows the Target page of the mass ingestion specification for an HDFS target. On the Target page, you can configure properties to define the HDFS target. The bottom of the page shows a section for Incremental Load Options. In the top-right corner, you have the option Next to go to the next page, or the button X to discard the specification.
The following table describes the properties that you can configure to define the HDFS target:
Property
Description
Target Connection
Required. The HDFS connection used to find the HDFS storage target.
If changes are made to the available HDFS connections, refresh the browser or log out and log back in to the Mass Ingestion tool.
Target Table Prefix
The prefix added to the names of the target files.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Target Table Suffix
The suffix added to the names of the target files.
Enter a string. You can enter alphanumeric and underscore characters. The prefix is not case sensitive.
Ingestion Directory
Required. The target directory on HDFS. A sub-directory is created under the ingestion directory for each source that is ingested.
If the specified directory already exists, the directory is replaced.
For example, you can enter
/temp
. A source table named
PRODUCT
is ingested to the directory
/temp/PRODUCT/
.
Compression
Required. The compressed file format that stores the target files. You can select None, Gzip, Bzip2, LZO, Snappy, or Custom. If you select Custom, enter the compression codec. Default is None.
Compression Codec
If you select custom compression, enter the fully qualified class name implementing the Hadoop CompressionCodec interface.
Delimiters
The delimiters used to separate data in the target files. You can select comma, semicolon, space, tab, or other. If you select Other, you can define a custom delimiter.
Other Delimiter
Required if you choose
Other
for the delimiter. Enter a custom delimiter.
Mode
Required if you enable incremental load. Select Append or Overwrite. Append mode appends the incremental data to the target. Overwrite mode overwrites the data in the target with the incremental data. Default is Append.
When the Data Integration Service stores temporary files that you ingest to an HDFS target, it appends a unique ID to the original file name. The resulting file name can have a maximum length of 255 characters.

0 COMMENTS

We’d like to hear from you!