Table of Contents

Search

  1. Preface
  2. Part 1: Version 10.4.1
  3. Part 2: Version 10.4.0
  4. Part 3: Version 10.2.2
  5. Part 4: Version 10.2.1
  6. Part 5: Version 10.2
  7. Part 6: Version 10.1.1
  8. Part 7: Version 10.1

Azure Databricks Integration

Azure Databricks Integration

Effective in version 10.2.2, you can integrate the Informatica domain with the Azure Databricks environment.
Azure Databricks is an analytics cloud platform that is optimized for the Microsoft Azure cloud services. It incorporates the open source Apache Spark cluster technologies and capabilities.
The Informatica domain can be installed on an Azure VM or on-premises. The integration process is similar to the integration with the Hadoop environment. You perform integration tasks, including importing the cluster configuration from the Databricks environment. The Informatica domain uses token authentication to access the Databricks environment. The Databricks token ID is stored in the Databricks connection.

Sources and Targets

You can run mappings against the following sources and targets within the Databricks environment:

    Microsoft Azure Data Lake Store

    Microsoft Azure Blob Storage

    Microsoft Azure SQL Data Warehouse

    Microsoft Azure Cosmos DB

Transformations

You can add the following transformations to a Databricks mapping:

    Aggregator

    Expression

    Filter

    Joiner

    Lookup

    Normalizer

    Rank

    Router

    Sorter

    Union

The Databricks Spark engine processes the transformation in much the same way as the Spark engine processes in the Hadoop environment.

Data Types

The following data types are supported:

    Array

    Bigint

    Date/time

    Decimal

    Double

    Integer

    Map

    Struct

    Text

    String

Mappings

When you configure a mapping, you can choose to validate and run the mapping in the Databricks environment. When you run the mapping, the Data Integration Service generates Scala code and passes it to the Databricks Spark engine.

Workflows

You can develop cluster workflows to create ephemeral clusters in the Databricks environment.
For more information, refer to the following guides:

    Big Data Management 10.2.2 Integration Guide

    Big Data Management 10.2.2 Administrator Guide

    Big Data Management 10.2.2 User Guide



Updated August 28, 2020