Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Data Preview Process

Data Preview Process

When you preview data on the Spark engine, the following process occurs:
  1. The Data Integration Service uses the Data Preview Service Module to determine whether to run the preview job in the native environment or on the Spark engine.
  2. The Data Preview Service Module pushes the job to the Spark engine and generates a Spark workflow based on the preview point.
  3. Based on the cluster distribution that you configure, either the Spark Jobserver or the DTM submits the Spark workflow tasks on the Hadoop cluster to generate the preview data.
  4. The run-time engine stages the data based on the configured HDFS staging directory.
  5. The Data Integration Service passes the staged data to the Developer tool and then deletes the staged data.
  6. The results of the preview appear in the data viewer of the Developer tool.
When you run data preview, the Data Integration Service validates the validation environments selected in the
Run-time
view.
If you enable HTTPS protocol on the Data Integration Service, the Spark Jobserver also uses HTTPS protocol. The Spark Jobserver uses the same HTTPS keystore configuration that you set in the Data Integration Service process properties in the Administrator tool.

0 COMMENTS

We’d like to hear from you!