Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Run-time Process on the Databricks Spark Engine

Run-time Process on the Databricks Spark Engine

When you run a job on the Databricks Spark engine, the Data Integration Service pushes the processing to the Databricks cluster, and the Databricks Spark engine runs the job.
The following image shows the components of the Informatica and the Databricks environments:
The image shows the Data Integration Service, the LDTM, and the Databricks engine executor under the native environment. In the Databricks environment, the image shows several nodes in a Databricks cluster.
  1. The Logical Data Transformation Manager translates the mapping into a Scala program, packages it as an application, and sends it to the Databricks Engine Executor on the Data Integration Service machine.
  2. The Databricks Engine Executor submits the application through REST API to the Databricks cluster, requests to run the application, and stages files for access during run time.
  3. The Databricks cluster passes the request to the Databricks Spark driver on the driver node.
  4. The Databricks Spark driver distributes the job to one or more Databricks Spark executors that reside on worker nodes.
  5. The executors run the job and stage run-time data to the Databricks File System (DBFS) of the workspace.


Updated September 28, 2020