Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Microsoft Azure Data Lake Storage Gen2
  3. PowerExchange for Microsoft Azure Data Lake Storage Gen2 Configuration
  4. Microsoft Azure Data Lake Storage Gen2 Connections
  5. PowerExchange for Microsoft Azure Data Lake Storage Gen2 Data Objects
  6. Microsoft Azure Data Lake Storage Gen2 Mappings
  7. Appendix A: Microsoft Azure Data Lake Storage Gen2 Datatype Reference

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

PowerExchange for Microsoft Azure Data Lake Storage Gen2 User Guide

Prerequisites

Prerequisites

Before you use PowerExchange for
Microsoft Azure Data Lake Storage Gen2
, you must complete the following prerequisites:
  • Install and configure the Informatica services.
  • Install and configure the Developer tool. You can install the Developer tool when you install Informatica clients.
  • Create a Data Integration Service and a Model Repository Service in the Informatica domain.
  • Verify that a cluster configuration is created in the domain.
  • Verify that a Metadata Access Service is created in the domain.
  • Verify that the user used to configure the Informatica domain is added to the cluster and the user has
    sudo
    privileges when you use non-kerberised Cloudera CDH 6.1 Hadoop distribution.
  • Verify that the following tasks are completed before you create a Microsoft Azure Data Lake Storage Gen2 connection:
    • Create an Azure Active Directory application to authenticate users to access the Azure Data Lake Storage Gen2 account. Provide
      Storage Blob Data Contributor
      role to the app.
    • Create an Azure Data Lake Storage Gen2 account and provide
      Contributor
      role to users.
    • Enable hierarchical namespaces for your Azure Data Lake Storage Gen2 account.
    • Create a file system for Microsoft Azure Data Lake Storage Gen2.
    • To access objects from an HDI 4.0 Kerberised cluster, configure the impersonation user details into your Azure Data Lake Storage Gen2 account. Provide
      Contributor
      role and
      full access
      , for the container used in the internal storage account of the HDInsight Data Lake Storage Gen2 cluster, to the impersonation user.
    For more information, see
    Azure Data Lake Storage Gen2
    documentation.
  • To successfully preview data from a local complex file or run a mapping in the native environment, you must configure the INFA_PARSER_HOME property for the Data Integration Service in Informatica Administrator. Perform the following steps to configure the INFA_PARSER_HOME property:
    • Log in to Informatica Administrator.
    • Click the Data Integration Service and then click the
      Processes
      tab on the right pane.
    • Click
      Edit
      in the
      Environment Variables
      section.
    • Click
      New
      to add an environment variable.
    • Enter the name of the environment variable as
      INFA_PARSER_HOME
      .
    • Set the value of the environment variable to the absolute path of the Hadoop distribution directory on the machine that runs the Data Integration Service.
  • To fetch metadata at the design time, set the value of the environment variable to the absolute path of the Cloudera CDH 6.1 directory on the machine that runs the Metadata Access Service. For example:
    INFA_PARSER_HOME
    =
    <Informatica installation directory>
    /services/shared/hadoop/CDH_6.1

Configure Databricks Connection Advanced Properties

Verify that a Databricks connection is created in the domain. If you want to read NULL values from or write NULL values to an Azure source, configure the following advanced properties in the Databricks connection:
  • infaspark.flatfile.reader.nullValue=True
  • infaspark.flatfile.writer.nullValue=True

Configure
Microsoft Azure Data Lake Storage Gen2
Access in Azure Databricks Cluster

Set the following Hadoop credential configuration options under Spark Config in your Databricks cluster configuration to access the
Microsoft Azure Data Lake Storage Gen2
:
fs.azure.account.auth.type OAuth fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider fs.azure.account.oauth2.client.id <your-service-client-id> fs.azure.account.oauth2.client.secret <your-service-client-secret-key> fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-ID-of-Azure-AD>/oauth2/token

Authentication Process

PoweExchange for
Microsoft Azure Data Lake Storage Gen2
uses OAuth 2.0 authorization. The following image shows how does PowerExchange for Azure Data Lake Storage Gen2 receive access tokens and resource access: This image shows the OAuth 2.0 authorization process between PowerExchange for Microsoft Azure Data Lake Storage Gen2 and the Azure Active Directory.

0 COMMENTS

We’d like to hear from you!