Table of Contents

Search

  1. Preface
  2. Introduction to Mass Ingestion
  3. Prepare
  4. Create
  5. Deploy
  6. Run
  7. Monitor
  8. infacmd mi Command Reference

Mass Ingestion Guide

Mass Ingestion Guide

Full Load

Full Load

Use a full load to ingest all of the data in the mass ingestion specification to the target. When you use a full load, the existing data in the Hive or HDFS target is deleted and replaced with the data in the source tables.
You might want to run a full load for any of the following reasons:
As a prerequisite for running incremental loads.
When you create a mass ingestion specification, run an initial full load before you begin running incremental loads on the data. The initial full load allows the Spark engine to create a basis to fetch incremental data in subsequent runs.
An initial full load can also help administrators maintain self-documented records. For example, it is possible to run an incremental load using overwrite mode as the first run of the specification, but the Spark engine does not have a basis to fetch incremental data. As a result, the Spark engine ingests all of the data from the source and effectively runs a full load. The records would indicate that a user ran an incremental load, but it might be unclear whether all data or only incremental data was ingested to the target.
If you run a initial full load followed by subsequent incremental loads, the administrator can distinguish whether the Spark engine ingested all data or only incremental data for each run of the specification.
To update the basis for incremental loads.
Run a full load to update the target based on UPSERT and DELETE statements that have been run against the relational database.
If you run an incremental load, the Spark engine fetches the rows that have been added to a relational table using INSERT statements. The Spark engine cannot fetch the rows that have been changed by UPSERT and DELETE statements, so an incremental load from the relational database might not provide an accurate representation of the source data.

0 COMMENTS

We’d like to hear from you!