Azure Databricks Integration
Effective in version 10.2.2, you can integrate the Informatica domain with the Azure Databricks environment.
Azure Databricks is an analytics cloud platform that is optimized for the Microsoft Azure cloud services. It incorporates the open source Apache Spark cluster technologies and capabilities.
The Informatica domain can be installed on an Azure VM or on-premises. The integration process is similar to the integration with the Hadoop environment. You perform integration tasks, including importing the cluster configuration from the Databricks environment. The Informatica domain uses token authentication to access the Databricks environment. The Databricks token ID is stored in the Databricks connection.
Sources and Targets
You can run mappings against the following sources and targets within the Databricks environment:
Microsoft Azure Data Lake Store
Microsoft Azure Blob Storage
Microsoft Azure SQL Data Warehouse
Microsoft Azure Cosmos DB
You can add the following transformations to a Databricks mapping:
The Databricks Spark engine processes the transformation in much the same way as the Spark engine processes in the Hadoop environment.
The following data types are supported:
When you configure a mapping, you can choose to validate and run the mapping in the Databricks environment. When you run the mapping, the Data Integration Service generates Scala code and passes it to the Databricks Spark engine.
You can develop cluster workflows to create ephemeral clusters in the Databricks environment.
For more information, refer to the following guides:
Big Data Management 10.2.2 Integration Guide
Big Data Management 10.2.2 Administrator Guide
Big Data Management 10.2.2 User Guide