User Guide

Back Next

Tokenizing Data and Persisting the Tokenized Data in a Repository

You can create fuzzy tokens for the input data based on the matching rules and persist the tokenized data in a repository so that you can perform searches on the tokenized data.

The following image shows the batch jobs that you can run to tokenize data and persist the tokenized data in a repository:

To persist the tokenized data in a repository, perform the following tasks:

Run the repository tokenization job.

The job creates fuzzy tokens for the input data in HDFS and loads the tokenized data into the repository.

Alternatively, you can perform the following tasks:

Run the HDFS tokenization job to create fuzzy tokens for the input data.

Optionally, run the region splitter job to analyze the input tokenized data and identify the split points for all the regions in the repository.

Run the load clustering job to load the tokenized data into the repository.

To search for records in the repository, run the repository batch search job.

To add an incremental data to the tokenized data in the repository, run the repository update job.

To delete records from the tokenized data in the repository, run the repository data deletion job.

Tokenizing Batch Data

Repository Tokenization Job

HDFS Tokenization Job

Region Splitter Job

Load Clustering Job

Repository Update Job

Repository Data Deletion Job

Repository Batch Search Job

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

We’d like to hear from you! Log in to comment.

Rename Saved Search

Table of Contents

User Guide

User Guide

Tokenizing Data and Persisting the Tokenized Data in a Repository

Tokenizing Data and Persisting the Tokenized Data in a Repository