Table of Contents

Search

  1. Preface
  2. Part 1: Introduction
  3. Part 2: Configuring Hub Console Tools
  4. Part 3: Building the Data Model
  5. Part 4: Configuring the Data Flow
  6. Part 5: Executing Informatica MDM Hub Processes
  7. Part 6: Configuring Application Access
  8. Appendix A: MDM Hub Properties
  9. Appendix B: Viewing Configuration Details
  10. Appendix C: Row-level Locking
  11. Appendix D: MDM Hub Logging
  12. Appendix E: Table Partitioning
  13. Appendix F: Collecting MDM Environment Information with the Product Usage Toolkit
  14. Appendix G: Glossary

Distributed Data Cleansing

Distributed Data Cleansing

You can run multiple
Process Servers
in parallel to increase the throughput of cleanse operations and fuzzy match processes. When a specified job size is met or exceeded, the MDM Hub distributes the job among the
Process Servers
.
To set up distributed data cleansing, for each
Process Server
, set the following properties in the
cmxcleanse.properties
file:
cmx.server.match.distributed_match
Optional. Must be added manually. Specifies whether the
Process Server
participates in a distributed processing environment for cleanse operations and fuzzy match processes. Default is
1
, which is enabled. To disable distributed processing, set to
0
.
For more information about configuring multiple
Process Servers
, see the
Multidomain MDM Installation Guide
.
cmx.server.cleanse.min_size_for_distribution
Optional. Must be added manually. Specifies the size at which a job can be distributed among the
Process Servers
. Default is
1000
.
In the Hub Console, configure the
Process Servers
properties as usual.

0 COMMENTS

We’d like to hear from you!