Table of Contents

Search

  1. Preface
  2. Part 1: Introduction
  3. Part 2: Configuring Hub Console Tools
  4. Part 3: Building the Data Model
  5. Part 4: Configuring the Data Flow
  6. Part 5: Executing Informatica MDM Hub Processes
  7. Part 6: Configuring Application Access
  8. Appendix A: MDM Hub Properties
  9. Appendix B: Viewing Configuration Details
  10. Appendix C: Row-level Locking
  11. Appendix D: MDM Hub Logging
  12. Appendix E: Table Partitioning
  13. Appendix F: Collecting MDM Environment Information with the Product Usage Toolkit
  14. Appendix G: Informatica Platform Staging
  15. Appendix H: Informatica Platform Mapping Examples
  16. Appendix I: Glossary

Complete Pre-Installation Tasks

Complete Pre-Installation Tasks

Before you install and set up Elasticsearch clusters, prepare the environment and determine whether you want to configure high availability.

Tasks for All Environments

Perform the following tasks to prepare the installation environment:
  • Ensure that each machine satisfies the hardware requirements for the supported version of Elasticsearch. For information about hardware, see the Elasticsearch documentation.
  • Ensure that each machine satisfies the software requirements for the supported version of Elasticsearch, such as supported operating systems and Java version. For information about the software requirements, see the
    Elasticsearch Support Matrix
    .
  • Complete important system configurations, such as swapping, file descriptors, and virtual memory. For information about important system configurations, see the Elasticsearch documentation.

Tasks for UNIX Environments

In a UNIX environment, perform the following tasks:
  • To avoid data loss due to insufficient number of file descriptors, set the number of file descriptors to 65536 or higher.
  • To prevent memory swapping, configure the system to prevent swapping. You can configure the Java Virtual Machine (JVM) to lock the heap in memory through
    mlockall
    .

High Availability Requirements

If you have a large amount of data to index and search, the best practice is to implement a highly available Elasticsearch cluster. A highly available cluster has multiple nodes, and the cluster can distribute the workload among the nodes. If one node fails in a production environment, the cluster distributes the workload to the other nodes.
As a pre-installation task, decide if you want to implement a highly available Elasticsearch cluster. If so, configure the Elasticsearch cluster as usual, but ensure that you satisfy the following additional requirements:
  • The Elasticsearch cluster has three or more nodes.
    You can set up a small cluster to start and scale it as necessary. Analyze the workload and make sure that you have enough capacity to handle a node failure.
  • Each node is configured on a separate, dedicated machine.
  • At least three of the nodes are master nodes to ensure stability and performance. Note that Elasticsearch recommends an odd number of master nodes.
    • If the cluster has only three nodes, configure all the nodes as master nodes.
    • If the cluster has more than three nodes, configure three nodes as master nodes and configure the rest of the nodes as data nodes.
  • Based on the Elasticsearch cluster size, decide on the number of replicas. When you use the Provisioning tool to configure the Elasticsearch index, you can specify the number of replicas to use.
  • For each node, set the following additional properties in the
    elasticsearch.yml
    configuration file:
    • discovery.zen.minimum_master_nodes
    • discovery.zen.ping.unicast.hosts
For more information about highly available clusters, including hardware requirements, system configurations, and property values, see the Elasticsearch documentation.

0 COMMENTS

We’d like to hear from you!