Table of Contents

Search

  1. Preface
  2. Introduction to Informatica MDM - Relate 360
  3. Linking Batch Data
  4. Tokenizing Batch Data
  5. Processing Streaming Data
  6. Creating Relationship Graph
  7. Loading Linked and Consolidated Data into Hive
  8. Searching Data
  9. Monitoring the Batch Jobs
  10. Troubleshooting
  11. Glossary

User Guide

User Guide

Glossary

Glossary

Hadoop Distributed File System (HDFS)
A distributed file storage system that Hadoop applications use.
HBase
A nonrelational database that runs on top of HDFS.
Hive
A data warehouse infrastructure built on top of Hadoop. Hive supports an SQL-like language called HiveQL for data summarization, query, and analysis.
Kafka
A distributed messaging system that manages input data stream.
Linking
A process of grouping related records into clusters based on the matching rules.
Matching
A process of comparing two records to identity whether they match based on the matching rules.
Population file
File that contains rules specific to the particular population of data. The rules define how to build keys and how the search and match strategies function for the specified population.
Searching
A process of comparing the input data with the repository data to identify matching records.
Spark
A distributed realtime computation system that processes the streaming data.
SSA-NAME3
A component of
Relate 360
that builds keys, generates an array of search key ranges, and uses the key ranges to identify records for matching.
Storm
A distributed realtime computation system that processes the streaming data.
Tokenization
A process of adding a fuzzy token, which is an encoded key, to each input record

0 COMMENTS

We’d like to hear from you!