When you preview data on the Spark engine, the following process occurs:
The Data Integration Service uses the Data Preview Service Module to determine whether to run the preview job in the native environment or on the Spark engine.
The Data Preview Service Module pushes the job to the Spark engine and generates a Spark workflow based on the preview point.
Based on the cluster distribution that you configure, either the Spark Jobserver or the DTM submits the Spark workflow tasks on the Hadoop cluster to generate the preview data.
The run-time engine stages the data based on the configured HDFS staging directory.
The Data Integration Service passes the staged data to the Developer tool and then deletes the staged data.
The results of the preview appear in the data viewer of the Developer tool.
When you run data preview, the Data Integration Service validates the validation environments selected in the
Run-time
view.
If you enable HTTPS protocol on the Data Integration Service, the Spark Jobserver also uses HTTPS protocol. The Spark Jobserver uses the same HTTPS keystore configuration that you set in the Data Integration Service process properties in the Administrator tool.