New Features Guide

10.4.1
- 10.4.0

Back Next

Azure Databricks Integration

Effective in version 10.2.2, you can integrate the Informatica domain with the Azure Databricks environment.

Azure Databricks is an analytics cloud platform that is optimized for the Microsoft Azure cloud services. It incorporates the open source Apache Spark cluster technologies and capabilities.

The Informatica domain can be installed on an Azure VM or on-premises. The integration process is similar to the integration with the Hadoop environment. You perform integration tasks, including importing the cluster configuration from the Databricks environment. The Informatica domain uses token authentication to access the Databricks environment. The Databricks token ID is stored in the Databricks connection.

Sources and Targets

You can run mappings against the following sources and targets within the Databricks environment:

Microsoft Azure Data Lake Store

Microsoft Azure Blob Storage

Microsoft Azure SQL Data Warehouse

Microsoft Azure Cosmos DB

Transformations

You can add the following transformations to a Databricks mapping:

Aggregator

Expression

Filter

Joiner

Lookup

Normalizer

Rank

Router

Sorter

Union

The Databricks Spark engine processes the transformation in much the same way as the Spark engine processes in the Hadoop environment.

Data Types

The following data types are supported:

Array

Bigint

Date/time

Decimal

Double

Integer

Map

Struct

Text

String

Mappings

When you configure a mapping, you can choose to validate and run the mapping in the Databricks environment. When you run the mapping, the Data Integration Service generates Scala code and passes it to the Databricks Spark engine.