Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections

Databricks Integration Overview

Databricks Integration Overview

Big Data Management can connect to Azure Databricks. Azure Databricks is an analytics cloud platform that is optimized for the Microsoft Azure cloud services. It incorporates the open-source Apache Spark cluster technologies and capabilities.
The Data Integration Service automatically installs the binaries required to integrate the Informatica domain with the Databricks environment. The integration requires Informatica connection objects and cluster configurations. A cluster configuration is a domain object that contains configuration parameters that you import from the Databricks cluster. You then associate the cluster configuration with connections to access the Databricks environment.
Perform the following tasks to integrate the Informatica domain with the Databricks environment:
  1. Install or upgrade to the current Informatica version.
  2. Perform pre-import tasks, such as verifying system requirements and permissions.
  3. Import the cluster configuration into the domain.
  4. Create a Databricks connection to run mappings within the Databricks environment.