Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Databricks Integration Overview

Databricks Integration Overview

Data Engineering Integration can connect to Databricks on Azure and on AWS. Databricks is an analytics cloud platform that you can use with Microsoft Azure cloud services or Amazon Web Services. Databricks incorporates the open-source Apache Spark cluster technologies and capabilities.
The Data Integration Service installs the binaries required to integrate the Informatica domain with the Databricks environment. The integration requires Informatica connection objects and cluster configurations. A cluster configuration is a domain object that contains configuration parameters that you import from the Databricks cluster. You then associate the cluster configuration with connections to access the Databricks environment.
Perform the following tasks to integrate the Informatica domain with the Databricks environment:
  1. Install or upgrade to the current Informatica version.
  2. Perform pre-import tasks, such as verifying system requirements and permissions.
  3. Import the cluster configuration into the domain.
  4. Create a Databricks connection to run mappings within the Databricks environment.