Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Management Administration
  3. Authentication
  4. Running Mappings on a Cluster with Kerberos Authentication
  5. Authorization
  6. Cluster Configuration
  7. Cloud Provisioning Configuration
  8. Data Integration Service Processing
  9. Connections
  10. Multiple Blaze Instances on a Cluster
  11. Monitoring REST API

Big Data Management Administrator Guide

Big Data Management Administrator Guide

Run-time Process on the Databricks Spark Engine

Run-time Process on the Databricks Spark Engine

When you run a job on the Databricks Spark engine, the Data Integration Service pushes the processing to the Databricks cluster, and the Databricks Spark engine runs the job.
The following image shows the components of the Informatica and the Databricks environments:
  1. The Logical Data Transformation Manager translates the mapping into a Scala program, packages it as an application, and sends it to the Databricks Engine Executor on the Data Integration Service machine.
  2. The Databricks Engine Executor submits the application through REST API to the Databricks cluster, requests to run the application, and stages files for access during run time.
  3. The Databricks cluster passes the request to the Databricks Spark driver on the driver node.
  4. The Databricks Spark driver distributes the job to one or more Databricks Spark executors that reside on worker nodes.
  5. The executors run the job and stage run-time data to the Databricks File System (DBFS) of the workspace.

0 COMMENTS

We’d like to hear from you!