You can preview data within a mapping in the Developer tool. You can choose sources and transformations in a mapping as preview points. Previewing data helps to design and debug mappings.
You can preview data in streaming mappings configured to run on the following cluster distributions:
Amazon EMR
Cloudera CDH
Cloudera CDP
Dataproc
The following image shows the run-time properties in the Hadoop execution environment for data preview:
When you configure run-time properties for the Hadoop environment to preview data on streaming jobs in the Data Viewer, consider the following properties:
You can specify the rollover size or rollover time in the
Execution Environment
area of the Developer tool. The steps to configure the rollover size and rollover time are similar to the configurations when you run a map. The rollover size is the target file size, in gigabytes(GB), at which to trigger rollover. A value of zero (0) means that the target file does not roll over based on size. Default is 100 bytes. The rollover time is the length of time, in hours, for a target file to roll over. After the time period has elapsed, the target file rolls over. A value of zero (0) means that the target file does not roll over based on time. Default is 1 hour.
You can specify the maximum runtime interval property when you perform data preview on a streaming job in the
Execution Environment
area of the Developer tool. Maximum runtime interval is the maximum time to run the mapping before it stops. Default is 2.5 minutes. If you set values for this property and the
Maximum Rows Read
property, the mapping stops running when one of the criteria is met.
Consider the following guidelines when you preview data on any streaming source:
You cannot preview data on an Aggregator transformation.
You cannot connect to the downstream transformation when you map the timestamp port in the Windows transformation to the Aggregator transformation.
When you preview a streaming source that contains a timestamp column, the timestamp column appears blank. However, the data appears in the timestamp column when you export the data.
You cannot preview data for a Normalizer transformation or a Router transformation when the transformations are configured with multiple output groups.