User Guide

Back Next

Loading Consolidated Data from HDFS

Use the Hive enabler job to load the consolidated data that you persist in HDFS into Hive. The Hive enabler job uses the output files of a consolidation job as input.

After you load the initial consolidated data into Hive, you cannot incrementally update the consolidated data. To update the consolidated data in Hive, you must reload the consolidated data of the entire dataset into Hive.

To create the consolidated data of the entire dataset, use the output files of an initial clustering job that you run in the incremental mode with the

--consolidate

option as the input for the consolidation job. You can then use the output files of the consolidation job as the input for the Hive enabler job.

The following image shows how the consolidated data in HDFS is loaded into Hive: