Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Back Next

Best Practices

To achieve the best performance for Big Data Management on the Azure cloud, consider the following best practices and recommendations:

Create the following resources in the same geographic location and vnet:

Azure SQL databases for the domain, monitoring, and Model repositories

Azure VM for the Informatica domain

Azure Storage (ADLS or GPv2)

HDInsight cluster

Azure Windows VM with Developer tool installation

Choose between ADLS or General-Purpose Storage (GPv2) for persistent data storage, depending on your use case. For example, ADLS is more commonly used for a data analytics use case.

With data residing in ADLS or GPv2, you can terminate the HDInsight cluster with a Delete Cluster task after the job is completed, providing significant cost savings.

To replicate data in Azure Storage in different locations, use cross-regional replication with RA-GRS. RA-GRS replicates your data to another data center in a secondary region and also provides you with the option to read from the secondary region. See the Azure documentation.

Spark shuffle service is enabled by default if you select Spark as the cluster type during the HDInsight cluster configuration process. Chose Spark version 2.3.0 (HDI 3.6).

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Rename Saved Search

Table of Contents

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Informatica® Big Data Management 10.2.1 on Microsoft Azure: Architecture and Best Practices

Best Practices

Best Practices