Use the Hive enabler job to load the consolidated data that you persist in HDFS into Hive. The Hive enabler job uses the output files of a consolidation job as input.
After you load the initial consolidated data into Hive, you cannot incrementally update the consolidated data. To update the consolidated data in Hive, you must reload the consolidated data of the entire dataset into Hive.
To create the consolidated data of the entire dataset, use the output files of an initial clustering job that you run in the incremental mode with the
--consolidate
option as the input for the consolidation job. You can then use the output files of the consolidation job as the input for the Hive enabler job.
The following image shows how the consolidated data in HDFS is loaded into Hive:
To load the consolidated data in HDFS into Hive, perform the following tasks:
Run the Hive enabler job.
The Hive enabler job loads the consolidated data in HDFS into Hive.
To update the consolidated data in Hive, perform the following tasks:
Run the initial clustering job in the incremental mode with the
--consolidate
option.
Run the consolidation job that uses the output files of the initial clustering job.
Drop the output table in Hive.
If the
<Output table>_internal
table exists in Hive, drop it.
Run the Hive enabler job that uses the output files of the consolidation job.