Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Connections Reference

Run-time Process on the Databricks Spark Engine

Run-time Process on the Databricks Spark Engine

When you run a job on the Databricks Spark engine, the Data Integration Service pushes the processing to the Databricks cluster, and the Databricks Spark engine runs the job.
The following image shows the components of the Informatica and the Databricks environments:
The image shows the Data Integration Service, the LDTM, and the Databricks engine executor under the native environment. In the Databricks environment, the image shows several nodes in a Databricks cluster.
  1. The Logical Data Transformation Manager translates the mapping into a Scala program, packages it as an application, and sends it to the Databricks Engine Executor on the Data Integration Service machine.
  2. The Databricks Engine Executor submits the application through REST API to the Databricks cluster, requests to run the application, and stages files for access during run time.
  3. The Databricks cluster passes the request to the Databricks Spark driver on the driver node.
  4. The Databricks Spark driver distributes the job to one or more Databricks Spark executors that reside on worker nodes.
  5. The executors run the job and stage run-time data to the Databricks File System (DBFS) of the workspace.