You are a data steward at a retail bank that has multiple branches. You manage a master set of the customer account records from all of the branches. You use a set of index database tables to verify that the customer account database does not contain redundant or duplicate records.
To create and manage the index data store, you perform the following operations:
You create the data store.
You update the data store with the most recent data from the bank branches.
You might add account data to the data store, or you might update the current data in the data store.
You remove obsolete records from the data store.
You understand that each operation might create duplicate records in the data store. You decide to develop a policy to analyze the branch data before you add the data to the master data store data. You use identity match analysis to analyze the branch data and to verify that the data does not create duplicate identities in the data store. You configure the persistent index options on the Match transformation to analyze the branch data and the data store.
Develop a Policy for Persistent Index Data Management
As a data steward, you define a business rule that states that the customer account data store cannot contain duplicate identities. You design an identity match mapping to analyze the branch data in a staging database before you add the data to the data store.
The operations to add the branch data to the data store can create duplicate identities in the following cases:
The branch data contains duplicate identities.
The branch data contains an identity that the index also contains.
The branch data contains a newer version of an identity in the data store, and the newer version matches another identity in the index.
When you compare the staging database to the data store, select the persistent index options that reflect the duplicate record status of the branch data. Before you update the data store, you might decide to compare the branch data with the index data.
You can enable and disable match analysis on some of the options. Enable match analysis to analyze the mapping data or to compare the index data store to the mapping data. Disable match analysis when you do not need to compare the data. You can also use the Match properties on the Match Output tab to include or exclude data from match analysis.
Compare a Mapping Data Source with the Index Data Store
To compare the mapping input data with the index data store and to make no change to the data store, select the following option:
Do not update the database
The mapping compares the input data to the index data store. The mapping does not add, remove, or update any data in index data store.
You cannot disable identity match analysis when you select the option.
Because you do not update the index data, you cannot create duplicate rows in the store. Select the option from the Match properties on the Match Output tab that meets the current needs of the data project. For example, select the
Full
option. The
Full
option verifies that the mapping data does not contain duplicates and verifies that the mapping data does not add duplicates to the data store.
Use the option to compare the mapping data and the data store before you update the data store. If the mapping output indicates that the mapping data does not add duplicates to the data store, run the mapping again. Select the option to update the database when you run the mapping again.
Create the Data Store and Add Rows to the Data Store
To create a data store or to add rows from the mapping data to a data store, select the following option:
Update the database with new IDs
The mapping adds a row to the data store if the row does not share a sequence identifier with a row in the data store. The mapping does not overwrite any row in the index tables. When you specify empty database tables, the mapping writes all of the mapping index data to the tables.
You can enable or disable identity match analysis when you select the option. The option enables match analysis by default.
Because you do not update the index rows, select the
Exclusive
option or the
Partial
option from the Match properties on the Match Output tab. Use the
Exclusive
option if you verified the uniqueness of the mapping data rows in an earlier process.
Update the Rows in the Data Store
To update a current row in the data store with the mapping data, select the following option:
Update the current IDs in the database
The mapping updates a current record in the data store if the record shares a sequence identifier with a record in the mapping data. The mapping does not add any row to the index tables.
You can enable or disable identity match analysis when you select the option. The option disables match analysis by default.
Because you do not add index rows to the index tables, select the
Full
option from the Match properties on the Match Output tab.
When you update the rows in the data store, you expect to find duplicates between the mapping source data and the data store. Select the
Full
option to verify that the identity data that you add to the store does not match the current data in the store.
Remove Rows from the Data Store
To remove rows from the data store, select the following option:
Remove IDs from the database
The mapping deletes a row from the data store if the row shares a sequence identifier with a record in the mapping data.
You can enable or disable identity match analysis when you select the option. The option disables match analysis by default.
When you remove data from a data store, you change the relationships between the rows in the store. If the store contains duplicate identities, you might remove data for a driver record or a linked record in a cluster. Or, you might remove data for the best match in a matched pair. When you run the mapping again, the mapping might generate different clusters or duplicate pairs. If you remove rows from a data store that does not contain duplicate records, you cannot change the duplicate status of the records. When you run the mapping after you delete the rows, the mapping generates the same match scores for the identities that remain in the data set.