PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

10.4.0
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.4
- 10.5.3
- 10.5.2
- 10.5.1
- 10.5
- 10.4.1

Back Next

Prerequisites

Before you use PowerExchange for

Microsoft Azure Data Lake Storage Gen2

, you must complete the following prerequisites:

Install and configure the Informatica services.

Install and configure the Developer tool. You can install the Developer tool when you install Informatica clients.

Create a Data Integration Service and a Model Repository Service in the Informatica domain.

Verify that a cluster configuration is created in the domain.

Verify that a Metadata Access Service is created in the domain.

Verify that the user used to configure the Informatica domain is added to the cluster and the user has

sudo

privileges when you use non-kerberised Cloudera CDH 6.1 Hadoop distribution.

Verify that the following tasks are completed before you create a Microsoft Azure Data Lake Storage Gen2 connection:

Create an Azure Active Directory application to authenticate users to access the Azure Data Lake Storage Gen2 account. Provide

Storage Blob Data Contributor

role to the app.

Create an Azure Data Lake Storage Gen2 account and provide

Contributor

role to users.

Enable hierarchical namespaces for your Azure Data Lake Storage Gen2 account.

Create a file system for Microsoft Azure Data Lake Storage Gen2.

To access objects from an HDI 4.0 Kerberised cluster, configure the impersonation user details into your Azure Data Lake Storage Gen2 account. Provide

Contributor

role and

full access

, for the container used in the internal storage account of the HDInsight Data Lake Storage Gen2 cluster, to the impersonation user.

For more information, see Azure Data Lake Storage Gen2
documentation.

To successfully preview data from a local complex file or run a mapping in the native environment, you must configure the INFA_PARSER_HOME property for the Data Integration Service in Informatica Administrator. Perform the following steps to configure the INFA_PARSER_HOME property:

Click the Data Integration Service and then click the

Processes

tab on the right pane.

Click

Edit

in the

Environment Variables

section.

Click

New

to add an environment variable.

Enter the name of the environment variable as

INFA_PARSER_HOME

Set the value of the environment variable to the absolute path of the Hadoop distribution directory on the machine that runs the Data Integration Service.

To fetch metadata at the design time, set the value of the environment variable to the absolute path of the Cloudera CDH 6.1 directory on the machine that runs the Metadata Access Service. For example:

INFA_PARSER_HOME

<Informatica installation directory>
/services/shared/hadoop/CDH_6.1

Configure Databricks Connection Advanced Properties

Verify that a Databricks connection is created in the domain. If you want to read NULL values from or write NULL values to an Azure source, configure the following advanced properties in the Databricks connection:

infaspark.flatfile.reader.nullValue=True

infaspark.flatfile.writer.nullValue=True

Configure
Microsoft Azure Data Lake Storage Gen2
Access in Azure Databricks Cluster

Set the following Hadoop credential configuration options under Spark Config in your Databricks cluster configuration to access the

Microsoft Azure Data Lake Storage Gen2

fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id <your-service-client-id>
fs.azure.account.oauth2.client.secret <your-service-client-secret-key>
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-ID-of-Azure-AD>/oauth2/token

Authentication Process

PoweExchange for

Microsoft Azure Data Lake Storage Gen2

uses OAuth 2.0 authorization. The following image shows how does PowerExchange for Azure Data Lake Storage Gen2 receive access tokens and resource access: This image shows the OAuth 2.0 authorization process between PowerExchange for Microsoft Azure Data Lake Storage Gen2 and the Azure Active Directory.

Rename Saved Search

Table of Contents

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

Prerequisites

Prerequisites

Configure Databricks Connection Advanced Properties

Configure Microsoft Azure Data Lake Storage Gen2 Access in Azure Databricks Cluster

Authentication Process

Configure
Microsoft Azure Data Lake Storage Gen2
Access in Azure Databricks Cluster