Effective in version 10.2.2, you can integrate the Informatica domain with the Azure Databricks environment.
Azure Databricks is an analytics cloud platform that is optimized for the Microsoft Azure cloud services. It incorporates the open source Apache Spark cluster technologies and capabilities.
The Informatica domain can be installed on an Azure VM or on-premises. The integration process is similar to the integration with the Hadoop environment. You perform integration tasks, including importing the cluster configuration from the Databricks environment. The Informatica domain uses token authentication to access the Databricks environment. The Databricks token ID is stored in the Databricks connection.
Sources and Targets
You can run mappings against the following sources and targets within the Databricks environment:
Microsoft Azure Data Lake Store
Microsoft Azure Blob Storage
Microsoft Azure SQL Data Warehouse
Microsoft Azure Cosmos DB
Transformations
You can add the following transformations to a Databricks mapping:
Aggregator
Expression
Filter
Joiner
Lookup
Normalizer
Rank
Router
Sorter
Union
The Databricks Spark engine processes the transformation in much the same way as the Spark engine processes in the Hadoop environment.
Data Types
The following data types are supported:
Array
Bigint
Date/time
Decimal
Double
Integer
Map
Struct
Text
String
Mappings
When you configure a mapping, you can choose to validate and run the mapping in the Databricks environment. When you run the mapping, the Data Integration Service generates Scala code and passes it to the Databricks Spark engine.
Workflows
You can develop cluster workflows to create ephemeral clusters in the Databricks environment.
For more information, refer to the following guides: