The repository data deletion job matches the input data in HDFS with the repository based on the column that you set as primary key and deletes the matching records from the repository.
The following image shows how the data is deleted from the repository when you run the repository data deletion job:
The repository data deletion job performs the following tasks:
Reads the input files from HDFS.
Matches the input data with the repository based on the column that you set as primary key.