The mass ingestion process incorporates the components within the mass ingestion architecture that create, deploy, run, and monitor a mass ingestion specification.
The mass ingestion process includes the following tasks:
Create
You create a mass ingestion specification in the Mass Ingestion tool. The Mass Ingestion Service validates and stores the specification in a Model repository.
After you create the specification, you can migrate the specification between Model repositories.
Deploy
You deploy the mass ingestion specification to a Data Integration Service and specify a Hadoop connection. The Mass Ingestion Service processes and deploys the specification to the Data Integration Service.
You can also deploy the mass ingestion specification to an application archive file to save the information about the specification as an application. If you deploy the specification to an application archive file, you can import the application to the Model repository. You can deploy the application to a Data Integration Service.
Run
You run the mass ingestion specification to ingest data to Hive or HDFS. The Mass Ingestion Service schedules the specification to run. The Data Integration Service connects to the Hadoop environment. In the Hadoop environment, the Blaze, Spark, and Hive engines ingest the data to the target.
Monitor
The Mass Ingestion Service generates ingestion job statistics. You can monitor the statistics in the Mass Ingestion tool.
You can also monitor the statistics in the Administrator tool to monitor the application and mappings that perform the ingestion job.
The following diagram illustrates the detailed mass ingestion process when you create, deploy, run, and monitor a mass ingestion specification: