Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Creating the Required Tables in the Repository

Creating the Required Tables in the Repository

Before you process the input data, you must create the required tables in the repository. Use the batch jobs to create the required tables in the repository.
  1. If you plan to perform the linking process, perform the following tasks:
    1. Run the initial clustering job with at least one record.
    2. If you want to uniformly distribute the linked data across all the regions in the repository, run the region splitter job.
      The job analyzes the input linked data and identifies the split points for all the regions in the repository.
    3. Run the load clustering job.
      The job creates primary key table, link table, and index table in the repository.
    4. If you want to consolidate the linked data, run the
      create_preferred_records_table.sh
      script located in the following directory:
      /usr/local/mdmbdrm-<Version Number>
      The script creates an empty preferred records table in the repository.
      Use the following command to run the
      create_preferred_records_table.sh
      script:
      create_preferred_records_table.sh --config=<Configuration file name>
      The following sample command runs the
      create_preferred_records_table.sh
      script:
      create_preferred_records_table.sh --config=/usr/local/conf/config_big.xml
    For more information about the initial clustering, region splitter, load clustering, and consolidation jobs, see the Linking Data and Persisting the Linked Data in a Repository section.
  2. If you plan to perform the tokenization process, perform one of the following tasks:
    • Run the repository tokenization job with at least one record.
      The job creates the required tables in the repository.
    • Perform the following tasks:
      1. Run the HDFS tokenization job with at least one record.
      2. If you want to uniformly distribute the linked data across all the regions in the repository, run the region splitter job.
        The job analyzes the input linked data and identifies the split points for all the regions in the repository.
      3. Run the load clustering job.
        The job creates the required tables in the repository.
      For more information about the repository tokenization, HDFS tokenization, region splitter, and load clustering jobs, see the Tokenizing Data and Persisting the Tokenized Data in a Repository section.

0 COMMENTS

We’d like to hear from you!