Table of Contents

Search

  1. Introduction
  2. Configuring Hub Console Tools
  3. Building the Data Model
  4. Configuring the Data Flow
  5. Executing Informatica MDM Hub Processes
  6. Configuring Application Access
  7. MDM Hub Properties
  8. Viewing Configuration Details
  9. Search with Solr
  10. Row-level Locking
  11. MDM Hub Logging
  12. Table Partitioning
  13. Collecting MDM Environment Information with the Product Usage Toolkit
  14. Glossary

Distributed Data Cleansing

Distributed Data Cleansing

You can run multiple
Process Servers
in parallel to increase the throughput of the cleanse process.
When you register a
Process Server
in the Hub Console, configure the following properties for each
Process Server
to enable a distributed cleanse process:
Property
Description
Offline
Specifies whether a
Process Server
is online or offline. Disable this property to ensure that a
Process Server
is online.The MDM Hub ignores the settings for the Offline property. Taking the
Process Server
online or offline is an administrative task.
Enable Cleanse Operations
Specifies whether to use the
Process Server
for cleanse operations. Enable to use
Process Server
for cleanse operations. Disable if you do not want to use the
Process Server
for cleansing. Default is enabled.
Threads for Cleanse Operations
Specifies the number of threads that a server must handle. Set the thread count higher than the number of CPUs available.
Enable Match Processing
Specifies whether to use the
Process Server
for match operations. Enable to use
Process Server
for match operations. Disable if you do not want to use the
Process Server
for match operations. Default is enabled.
CPU Rating
Rates the relative strength of CPUs of the
Process Server
machines. Assign a higher rating to a machine with a more powerful CPU. The MDM Hub assigns jobs to machines based on CPU rating.
The MDM Hub distributes the cleanse job if the minimum size for distribution is reached.
The following table describes the distributed cleanse and match properties that you can set in the
cmxcleanse.properties
file:
Property
Description
cmx.server.match.distributed_match
Specifies whether a
Process Server
is enabled for distributed cleanse and match. Set to 1 to enable distributed cleanse and match.
cmx.server.cleanse.min_size_for_distribution
Specifies the minimum size for distribution. The MDM Hub distributes the cleanse job if the minimum size for distribution is reached. The default is 1,000.

0 COMMENTS

We’d like to hear from you!