Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Databricks Integration Overview

Databricks Integration Overview

Data Engineering Integration can connect to Databricks on Azure and on AWS. Databricks is an analytics cloud platform that you can use with Microsoft Azure cloud services or Amazon Web Services. Databricks incorporates the open-source Apache Spark cluster technologies and capabilities.
The Data Integration Service installs the binaries required to integrate the Informatica domain with the Databricks environment. The integration requires Informatica connection objects and cluster configurations. A cluster configuration is a domain object that contains configuration parameters that you import from the Databricks cluster. You then associate the cluster configuration with connections to access the Databricks environment.
With the following exceptions, all of the functionality described in this article applies to Informatica 10.4 and later releases:
  • Informatica 10.5 adds support for warm pool access and the Sequence Generator transformation.
  • Informatica 10.5.2 adds support for Databricks schema evolution and custom parameters.
For more information about these features, see the
Data Engineering Integration User Guide
.
Perform the following tasks to integrate the Informatica domain with the Databricks environment:
  1. Install or upgrade to the current Informatica version.
  2. Perform pre-import tasks, such as verifying system requirements and permissions.
  3. Import the cluster configuration into the domain.
  4. Create a Databricks connection to run mappings within the Databricks environment.