A business entity type is a set of similar type of input data. The input data can be customer data, transaction data, product data, and other types of data. You can customize the
PZMAP
section of the configuration file and the match rule sets in the matching rules file for each type of input data. You can link or tokenize the input data based on the type of data.
For example, you can link the customer data to identify the household relationships. You do not require to link the transaction data. In this case, link the customer data and load the linked data into the repository, and tokenize the transaction data and load the tokenized data into the repository.
The following image shows the batch jobs that you can run to create the relationship graph:
To create a relationship graph, perform the following tasks:
If you want to link the input data, perform the following tasks:
Run the initial clustering job.
The job links the input data and creates linked and match-pair data in HDFS.
Run the load clustering job.
The job creates required tables in the repository and loads the linked data into the tables.
Run the load match pairs job.
The job loads the match-pair data of the business entity type into the repository.
If you do not want to link the input data, run the repository tokenization job.
The job tokenizes the input data, creates required tables in the repository, and loads the tokenized data into the tables.
Similarly, process other business entity types and load the processed data into the repository.
Run the create relationship job.
The job creates relationship between two entities.
Similarly, run the create relationship job for other entities to create the relationship graph.