The mass ingestion process incorporates the components within the mass ingestion architecture that create, deploy, run, and monitor a mass ingestion specification.
The mass ingestion process includes the following tasks:
You create a mass ingestion specification in the Mass Ingestion tool. The Mass Ingestion Service validates and connects to the Model Repository Service to store the specification in a Model repository.
After you create the specification, you can migrate the specification between Model repositories.
You deploy the mass ingestion specification to a Data Integration Service and specify a Hadoop connection. The Mass Ingestion Service processes and deploys the specification to the Data Integration Service.
You can also deploy the mass ingestion specification to an application archive file to save the information about the specification as an application. If you deploy the specification to an application archive file, you can import the application to the Model repository and deploy the application to a Data Integration Service.
You run the mass ingestion specification to ingest data to a Hive or HDFS target. The Mass Ingestion Service schedules the specification to run, and the Data Integration Service pushes the specification to the Spark engine in the Hadoop environment.
The Mass Ingestion Service generates ingestion job statistics. You can monitor the statistics in the Mass Ingestion tool or in the Administrator tool.
The following diagram illustrates the detailed mass ingestion process when you create, deploy, run, and monitor a mass ingestion specification: