Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Loading Consolidated Data from HDFS

Loading Consolidated Data from HDFS

Use the Hive enabler job to load the consolidated data that you persist in HDFS into Hive. The Hive enabler job uses the output files of a consolidation job as input.
After you load the initial consolidated data into Hive, you cannot incrementally update the consolidated data. To update the consolidated data in Hive, you must reload the consolidated data of the entire dataset into Hive.
To create the consolidated data of the entire dataset, use the output files of an initial clustering job that you run in the incremental mode with the
--consolidate
option as the input for the consolidation job. You can then use the output files of the consolidation job as the input for the Hive enabler job.
The following image shows how the consolidated data in HDFS is loaded into Hive:
The Hive enabler job loads the consolidated data into Hive from HDFS.
To load the consolidated data in HDFS into Hive, perform the following tasks:
  1. Run the Hive enabler job.
    The Hive enabler job loads the consolidated data in HDFS into Hive.
  2. To update the consolidated data in Hive, perform the following tasks:
    1. Run the initial clustering job in the incremental mode with the
      --consolidate
      option.
    2. Run the consolidation job that uses the output files of the initial clustering job.
    3. Drop the output table in Hive.
    4. If the
      <Output table>_internal
      table exists in Hive, drop it.
    5. Run the Hive enabler job that uses the output files of the consolidation job.

0 COMMENTS

We’d like to hear from you!