User Guide

Back Next

Glossary

Hadoop Distributed File System (HDFS): A distributed file storage system that Hadoop applications use.
HBase: A nonrelational database that runs on top of HDFS.
Hive: A data warehouse infrastructure built on top of Hadoop. Hive supports an SQL-like language called HiveQL for data summarization, query, and analysis.
Kafka: A distributed messaging system that manages input data stream.
Linking: A process of grouping related records into clusters based on the matching rules.
Matching: A process of comparing two records to identity whether they match based on the matching rules.
Population file: File that contains rules specific to the particular population of data. The rules define how to build keys and how the search and match strategies function for the specified population.
Searching: A process of comparing the input data with the repository data to identify matching records.
Spark: A distributed realtime computation system that processes the streaming data.
SSA-NAME3: A component of
Relate 360
that builds keys, generates an array of search key ranges, and uses the key ranges to identify records for matching.
Storm: A distributed realtime computation system that processes the streaming data.
Tokenization: A process of adding a fuzzy token, which is an encoded key, to each input record