PowerExchange for HBase User Guide

PowerExchange for HBase User Guide

HBase Mapping Example

HBase Mapping Example

Your organization is a mobile service provider and it needs to load the data in WAP log files to HBase tables and generate multiple reports.
WAP log files can contain columns with information about the mobile users, internet usage, and data volume. On a single day, the WAP log files can be three to four billion rows of data and can be around two terabytes in size.
You can consolidate the data in the WAP log files that you receive through the day. You can then perform transformations based on your requirements.
The following figure shows the HBase mapping example:
The HBase mapping example contains a WAP log file as a source flat file data object, transformations to filter, sort, and aggregate the input data, a tranformation to generate row ID, and an HBase data object write operation
You can use the following objects in an HBase mapping:
Flat File Data Object
The source for the mapping is a flat file data object that contains the data in a WAP log file.
Create a flat file data object and specify the WAP log file as the resource for the data object. Source columns in the flat file data object include Province ID, data volume, URL, and session duration. Configure the read properties of the data object.
Transformations
Add transformations to get aggregate data about the internet usage of the mobiles users in a particular province.
  • The ftr_Province_ID Filter transformation filters the data in the log files based on the value you specify for the province ID column.
    The Data Integration Service returns the rows that meet the filter condition.
  • The srt-Records Sorter transformation sorts the data in ascending order based on the province ID.
  • The agg_Records Aggregator transformation collects statistics about internet usage and data volume of the mobile users for a particular province.
    Use the result of the Sorter transformation as an input to the Aggregator transformation. You can increase Aggregator transformation performance with the sorted input option.
  • The gen_UUID_as_ROWID Java transformation generates a unique row key ID before you load the data to HBase tables.
    Each row in an HBase table has a unique row key ID. You can write the generated key value as the row key ID for each row in the HBase table
  • The Expression transformation formats the data before you load it to the Hbase table.
HBase Data Object
The target of the mapping is an HBase data object. Specify the columns in the HBase table to which you want to write the data.
Create an HBase data object write operation to write data to the HBase table.
After you run the mapping, the Data Integration Service writes the transformed data to the HBase table. Analysts can run queries and perform real-time analysis of daily operations, statistics about gateway usage, and data volume based on the data in the HBase tables.