Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Region Splitter Job

Region Splitter Job

Use the region splitter job to analyze the input tokenized data and identify the split points to uniformly distribute the tokenized data across all the regions in the repository. The uniform distribution of the tokenized data optimally utilizes the resources and improves the search performance.
A load clustering job uses the output files of a region splitter job to distribute the linked data. Run the region splitter job before you run the load clustering job for the first time.
The following image shows how the region splitter job identifies the split points based on the input data:
The region splitter job reads the tokenized data in HDFS and identifies the split points for all the regions.
The region splitter job performs the following tasks:
  1. Reads the tokenized data in HDFS.
  2. Identifies the split points for the number of regions that you specify.

0 COMMENTS

We’d like to hear from you!