The Data Preview Service Module manages requests from the Developer tool to preview source or transformation data in a mapping.
When you preview data, the Developer tool sends the request to the Data Integration Service. The Data Integration Service uses the Data Preview Service Module to determine whether to run the job in the native or non-native environment based on the preview point. The preview point is the object in a mapping that you choose to view data for.
Data preview jobs run on either the Data Integration Service or the Spark engine. The Spark engine runs the job in the following cases:
The preview point or any upstream transformation contains hierarchical data.
The preview point or any upstream transformation is a Python transformation.
The preview point or any upstream transformation is an Expression transformation configured for windowing.
The mapping contains a combination of transformations that must run on the Spark engine.
When the Spark engine runs a data preview job, the job uses either the Spark Jobserver or spark-submit scripts depending on the cluster distribution you configure. If you configure the mapping with a distribution that supports Spark Jobserver, the Data Preview Service Module uses Spark Jobserver to run preview jobs on the Spark engine. Otherwise, the Data Preview Service Module uses a spark-submit script.
For more information about supported cluster distributions, see the
Data Engineering Integration User Guide
.
When the Data Integration Service receives a preview request that uses the Spark Jobserver, the Data Preview Service Module starts the Spark Jobserver and passes the mapping to the LDTM. The LDTM generates a Spark workflow and the Spark Jobserver runs the job on the Hadoop cluster. The data preview job stages the result on the configured HDFS staging directory. The Data Integration Service passes the staged data to the Developer tool.