Table of Contents

Search

  1. Preface
  2. Introduction to Data Engineering Streaming
  3. Data Engineering Streaming Administration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Transformation in Streaming Mappings
  8. Window Transformation
  9. Appendix A: Connections
  10. Appendix B: Monitoring REST API Reference
  11. Appendix C: Sample Files

Data Preview

Data Preview

You can preview data within a mapping in the Developer tool. You can choose sources and transformations in a mapping as preview points. Previewing data helps to design and debug mappings.
You can preview data in streaming mappings configured to run on the following cluster distributions:
  • Amazon EMR
  • Cloudera CDH
  • Cloudera CDP
  • Dataproc
The following image shows the run-time properties in the Hadoop execution environment for data preview:
The images describes the runtime properties in a Hadoop environment for data preview.
When you configure run-time properties for the Hadoop environment to preview data on streaming jobs in the Data Viewer, consider the following properties:
  • You can specify the rollover size or rollover time in the
    Execution Environment
    area of the Developer tool. The steps to configure the rollover size and rollover time are similar to the configurations when you run a map. The rollover size is the target file size, in gigabytes(GB), at which to trigger rollover. A value of zero (0) means that the target file does not roll over based on size. Default is 100 bytes. The rollover time is the length of time, in hours, for a target file to roll over. After the time period has elapsed, the target file rolls over. A value of zero (0) means that the target file does not roll over based on time. Default is 1 hour.
  • You can specify the maximum runtime interval property when you perform data preview on a streaming job in the
    Execution Environment
    area of the Developer tool. Maximum runtime interval is the maximum time to run the mapping before it stops. Default is 2.5 minutes. If you set values for this property and the
    Maximum Rows Read
    property, the mapping stops running when one of the criteria is met.
For more information about configuring a streaming mapping, see Mapping Configurations for Hadoop.
Consider the following guidelines when you preview data on any streaming source:
  • You cannot preview data on an Aggregator transformation.
  • You cannot connect to the downstream transformation when you map the timestamp port in the Windows transformation to the Aggregator transformation.
  • When you preview a streaming source that contains a timestamp column, the timestamp column appears blank. However, the data appears in the timestamp column when you export the data.
  • You cannot preview data for a Normalizer transformation or a Router transformation when the transformations are configured with multiple output groups.
For more information about data preview, see the
Data Engineering Integration User Guide
.