Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Linking Data and Persisting the Linked Data in a Repository

Linking Data and Persisting the Linked Data in a Repository

You can link the input data based on the matching rules and consolidate the linked data based on the consolidation rules. You can then persist the linked and consolidated data in a repository so that you can perform data analytics or searches on the data.
The following image shows the batch jobs that you can run to link the input data, consolidate the linked data, and persist the linked data in a repository:
Run the initial clustering job to create linked data in HDFS. If you want to process the linked data, run the post-clustering job. If you want to consolidate the linked data, run the consolidation job. If you want to delete data from the linked data, run the HDFS data deletion job. If you want to add an incremental data to the linked data, run the initial clustering job in the incremental mode.
To persist the linked data in a repository, perform the following tasks:
  1. Run the initial clustering job.
    The job links the input data and creates clusters for the input data in HDFS.
  2. If you want to process the output files of the initial clustering job, run the post-clustering job.
    The post-clustering job reads the output files that the initial clustering job creates in HDFS and processes it based on the mode that you set.
  3. If you want to uniformly distribute the linked data across all the regions in the repository, run the region splitter job.
    The job analyzes the input linked data and identifies the split points for all the regions in the repository.
  4. Run the load clustering job.
    The job loads the linked data from HDFS into the repository.
  5. If you want to consolidate the linked data, run the consolidation job.
    The consolidation job creates a preferred records table with a preferred record for each cluster.
  6. To add an incremental data to the repository, run the initial clustering job in the incremental mode to link the incremental data and run the load clustering job in the incremental mode to add the linked data to the repository.
    If you consolidate the linked data, you can also run the consolidation job in the incremental mode to update the incremental data.
  7. To delete records from the repository, run the repository data deletion job with the
    --useIndexId
    parameter.

0 COMMENTS

We’d like to hear from you!