Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Databricks Integration Overview

Databricks Integration Overview

Data Engineering Integration can connect to Databricks on Azure and on AWS. Databricks is an analytics cloud platform that you can use with Microsoft Azure cloud services or Amazon Web Services. Databricks incorporates the open-source Apache Spark cluster technologies and capabilities.
The Data Integration Service installs the binaries required to integrate the Informatica domain with the Databricks environment. The integration requires Informatica connection objects and cluster configurations. A cluster configuration is a domain object that contains configuration parameters that you import from the Databricks cluster. You then associate the cluster configuration with connections to access the Databricks environment.
Unless otherwise specified, all of the functionality described in this article applies to Informatica 10.4 and later releases. Informatica 10.5 adds support for warm pool access and the Sequence Generator transformation.
Perform the following tasks to integrate the Informatica domain with the Databricks environment:
  1. Install or upgrade to the current Informatica version.
  2. Perform pre-import tasks, such as verifying system requirements and permissions.
  3. Import the cluster configuration into the domain.
  4. Create a Databricks connection to run mappings within the Databricks environment.