Big Data Management User Guide

Back Next

Summary Statistics

The

Summary Statistics

view appears in the details panel when you select a mapping job in the contents panel. The

Summary Statistics

view displays throughput and resource usage statistics for the job.

You can view the following throughput statistics for the job:

Source. The name of the mapping source instance.

Target name. The name of the target instance.

Rows. The number of rows read for source and target. If the target is Hive, this is the only summary statistic available.

Average Rows/Sec. Average number of rows read per second for source and target.

Bytes. Number of bytes read for source and target.

Average Bytes/Sec. Average number of bytes read per second for source and target.

First Row Accessed. The date and time when the Data Integration Service started reading the first row in the source file.

Dropped rows. Number of source rows that the Data Integration Service did not read.

You can view the throughput statistics for the job in the details pane in the following image:

The Monitor tab in the Administrator tool shows the mapping, script, and Hive query in the Ad Hoc Jobs pane. In the details pane under Summary Statistics, the throughput appears for the source and target. Under the source, the AllHiveSourceTables row appears with all the source statistics, such as first row accessed and dropped rows. Under the target, a row appears with all the target statistics, such as average bytes and rejected rows.

The Hive summary statistics include a row called "AllHiveSourceTables." This row includes records read from the following sources for the MapReduce engine:

Original Hive sources in the mapping.

Staging Hive tables defined by the Hive engine.

Staging data between two linked MapReduce jobs in each query.

If the LDTM session includes one Tez job, the "AllHiveSourceTables" statistics only includes original Hive sources in the mapping.

The AllHiveSourceTables statistics only includes the original Hive sources in a mapping for the Tez job.

When a mapping contains customized data objects or logical data objects, the summary statistics display the original source data instead of the customized data objects or logical data objects in the Administrator tool and in the session log. The Hive driver reads data from the original source data.

You can view the Tez job statistics in the Administrator tool when reading and writing to Hive tables that the Spark engine launches in any of the following scenarios:

You have resources present in the Amazon buckets.

You have transactional Hive tables.

You have table columns secured with fine-grained SQL authorization.

Incorrect statistics appears for all the Hive sources and targets indicating zero rows for average rows for each second, bytes, average bytes for each second, and rejected rows. You can see that only processed rows contain correct values, and the remaining columns will contain either 0 or N/A.

When an Update Strategy transformation runs on the Hive engine, the Summary Statistics for the target table instance combines the number of inserted rows processed, deleted rows processed, and twice the number of updated rows processed. The update operations are handled as separate delete and insert operations.

Rename Saved Search

Table of Contents

Big Data Management User Guide

Big Data Management User Guide

Summary Statistics

Summary Statistics