Table of Contents

Search

  1. Preface
  2. Installing MDM Big Data Relationship Management
  3. Configuring MDM Big Data Relationship Management
  4. Configuring Security
  5. Setting Up the Environment to Process Streaming Data
  6. Configuring Distributed Search
  7. Packaging and Deploying the RESTful Web Services

Installation and Configuration Guide

Installation and Configuration Guide

Step 3.Deploy MDM Big Data Relationship Management on Storm

Step 3.Deploy MDM Big Data Relationship Management on Storm

You must deploy MDM Big Data Relationship Management on Storm to link, consolidate, or tokenize the input data. Run the setup_realtime.sh script located in the following directory to deploy MDM Big Data Relationship Management on Storm: /usr/local/mdmbdrm-<Version Number>
Use the following command to run the setup_realtime.sh script:
setup_realtime.sh

--config=configuration_file_name

--rule=matching_rules_file_name

--useStorm

[--consolidate=consolidation_rules_file_name]

[--instanceName=instance_name]

[--spoutName=spout_name]

[--workers=number_of_worker_processes]

[--zookeeper=zookeeper_connection_string]

[--skipCreateTopic]

[--partitions=number_of_partitions]

[--replica=number_of_replicas]

[--outputTopic=output_topic_name]
The following table describes the options and arguments that you can specify to run the setup_realtime.sh script:
Option Argument Description
--config configuration_file_name Absolute path and file name of the configuration file that you create.
--rule matching_rules_file_name Absolute path and file name of the matching rules file that you create.
The values in the matching rules file override the values in the configuration file.
--useStorm Indicates to deploy MDM Big Data Relationship Management on Storm.
--consolidate consolidation_rules_file_name
Optional. Absolute path and file name of the consolidation rules file.
Use the consolidation rules file only when you want to consolidate the linked data and create preferred records for all the clusters.
--instanceName topology_name Optional. Name for the topology that processes the input data.
Default is BDRMIngest-topology.
--spoutName spout_name Optional. Name for the spout that reads the input data and emits the input data into the topology.
Default is BDRMIngest-spout.
--workers number_of_worker_processes Optional. Number of worker processes for the topology. Each worker process is a physical JVM and runs a subset of all the tasks for the topology.
Default is 3.
--zookeeper zookeeper_connection_string Optional. Connection string to access the ZooKeeper server.
Use the following format for the connection string:
<Host Name>:<Port>[/<chroot>]
The connection string uses the following parameters:
  • Host Name. Host name of the ZooKeeper server.
  • Port. Port on which the ZooKeeper server listens.
  • chroot. Optional. ZooKeeper root directory that you configure in Kafka. Default is /.
The following example connection string uses the default ZooKeeper root directory: server1.domain.com:2182
The following example connection string uses the user-defined ZooKeeper root directory: server1.domain.com:2182/kafkaroot
If you use an ensemble of ZooKeeper servers, you can specify multiple ZooKeeper servers separated by commas.
--skipCreateTopic Required if the topic that you specify in the configuration file already exists in Kafka. Indicates to skip creating the topic.
By default, the script creates the topic.
--partitions number_of_partitions Optional. Number of partitions for the topic. Use partitions to split the data in the topic across multiple brokers. Default is 1.
--replica number_of_replicas Optional. Number of replicas that you want to create for the topic. Use replicas for high availability purposes.
Default is 1.
--outputTopic output_topic_name Optional. Name of the topic in Kafka to which you want to publish the output messages. By default, the output messages are not published.
The script does not create the output topic, so ensure that you create the output topic to publish the output messages to it.
For example, the following command runs the script that deploys MDM Big Data Relationship Management on Storm:
setup_realtime.sh --config=/usr/local/conf/config_big.xml --rule=/usr/local/conf/matching_rules.xml --useStorm --consolidate=/usr/local/conf/consolidationfile.xml --instanceName=Prospects --zookeeper=10.28.10.345 --skipCreateTopic --partitions=3 --replica=2 --spoutName=Insurance --workers=5 --outputTopic=InsuranceOutput


Updated June 27, 2019


Explore Informatica Network