Table of Contents

Search

  1. Preface
  2. Introduction to Mass Ingestion
  3. Prepare
  4. Create
  5. Deploy
  6. Run
  7. Monitor
  8. infacmd mi Command Reference

Mass Ingestion Guide

Mass Ingestion Guide

Overview

Overview

Mass ingestion is the ingestion or replication of large amounts of data for use or storage in a database or a repository. The database can be a data lake, a cloud repository, or a Hadoop cluster.
To ingest or replicate large amounts of data between a relational database and a Hive or HDFS target, use the Mass Ingestion tool. In the Mass Ingestion tool, you can create a mass ingestion specification.
A mass ingestion specification is a configuration that determines how a data source is ingested into a specific location in the Hive or HDFS target. In the specification, you configure the relational source and the Hive or HDFS target. You can also configure parameters to perform a light transformation on the ingested data. For example, you can filter the data to ingest only certain columns or you can mask the data to protect private information.
Deploy and run the mass ingestion specification to ingest all of the data at once. The specification is deployed to a Data Integration Service. When you run the specification, the Data Integration Service connects to the Hadoop environment. In the Hadoop environment, the Blaze, Spark, and Hive engines ingest the data to the target. As the mass ingestion specification runs, you can begin monitoring the ingestion process.
A mass ingestion specification replaces the need to manually create and run mappings. You create one mass ingestion specification that ingests all of the data at once.

0 COMMENTS

We’d like to hear from you!