User Guide

Back Next

Linking Data and Persisting the Linked Data in HDFS

You can link the input data based on the matching rules and consolidate the linked data based on the consolidation rules. You can persist the linked and consolidated data in HDFS.

The following image shows the batch jobs that you can run to link data, consolidate the linked data, and persist the data in HDFS:

To persist the linked and consolidated data in HDFS, perform the following tasks:

Run the initial clustering job.

The job links the input data and creates clusters for the input data in HDFS.

If you want to process the output files of the initial clustering job, run the post-clustering job.

The post-clustering job reads the output files that the initial clustering job creates in HDFS and processes it based on the mode that you set.

If you want to consolidate the linked data, run the consolidation job.

The consolidation job creates a preferred record for each cluster.

To add incremental data to the linked data and link the incremental data, run the initial clustering job in the incremental mode.

To delete records from the linked data in HDFS, run the HDFS data deletion job.

Linking Batch Data

Initial Clustering Job

Post-Clustering Job

Consolidation Job

HDFS Data Deletion Job

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal

0 COMMENTS

We’d like to hear from you! Log in to comment.

Rename Saved Search

Table of Contents

User Guide

User Guide

Linking Data and Persisting the Linked Data in HDFS

Linking Data and Persisting the Linked Data in HDFS