Table of Contents

Search

  1. Abstract
  2. Supported Versions
  3. Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Best Practices

Best Practices

To achieve the best performance for Big Data Management on the Azure cloud, consider the following best practices and recommendations:
  • Create the following resources in the same geographic location and vnet:
    • Azure SQL databases for the domain, monitoring, and Model repositories
    • Azure VM for the Informatica domain
    • Azure Storage (ADLS or GPv2)
    • HDInsight cluster
    • Azure Windows VM with Developer tool installation
  • Choose between ADLS or General-Purpose Storage (GPv2) for persistent data storage, depending on your use case. For example, ADLS is more commonly used for a data analytics use case.
  • With data residing in ADLS or GPv2, you can terminate the HDInsight cluster with a Delete Cluster task after the job is completed, providing significant cost savings.
  • To replicate data in Azure Storage in different locations, use cross-regional replication with RA-GRS. RA-GRS replicates your data to another data center in a secondary region and also provides you with the option to read from the secondary region. See the Azure documentation.
  • Spark shuffle service is enabled by default if you select Spark as the cluster type during the HDInsight cluster configuration process. Chose Spark version 2.3.0 (HDI 3.6).

0 COMMENTS

We’d like to hear from you!