Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Run-time Process on the Databricks Spark Engine

Run-time Process on the Databricks Spark Engine

When you run a job on the Databricks Spark engine, the Data Integration Service pushes the processing to the Databricks cluster, and the Databricks Spark engine runs the job.
The following image shows the components of the Informatica and the Databricks environments:
  1. The Logical Data Transformation Manager translates the mapping into a Scala program, packages it as an application, and sends it to the Databricks Engine Executor on the Data Integration Service machine.
  2. The Databricks Engine Executor submits the application through REST API to the Databricks cluster, requests to run the application, and stages files for access during run time.
  3. The Databricks cluster passes the request to the Databricks Spark driver on the driver node.
  4. The Databricks Spark driver distributes the job to one or more Databricks Spark executors that reside on worker nodes.
  5. The executors run the job and stage run-time data to the Databricks File System (DBFS) of the workspace.


Updated July 10, 2020