Table of Contents

Search

  1. Preface
  2. Introduction to Mass Ingestion
  3. Prepare
  4. Create
  5. Deploy
  6. Run
  7. Monitor
  8. Appendix A: infacmd mi Command Reference

Mass Ingestion Guide

Mass Ingestion Guide

Use Case

Use Case

You work in an IT group for a commercial bank. Your team is taking on a new project to personalize the rewards program that you offer to customers who open checking and savings accounts at your bank.
You plan to collect and analyze data on your customers to understand the types of rewards that customers are interested in. For example, one customer might be interested in saving money on groceries while another customer might be interested in travel deals.
You collect data on customer demographics, lifestyle metrics, income, transaction history, spending habits, online presence, interests, opinions, and brand knowledge. The data is collected through different media, including customer logs on file with the bank, point-of-sale systems at companies that the bank partners with, social media interactions, and customer weblogs.
The following image shows the types of data that you collect and the media that you use to collect the different types of data:
This image displays icons that depict how the different types of data can be measured through the following: customer logs on file with the bank, point-of-sale systems, social media interactions, and customer weblogs.
When the data is collected, the data is stored in the bank's corporate data center which includes various relational databases.
The following image shows how the data might be stored::
This image shows the different types of data pointing to the corporate data center whether the data are stored.
Before your data analysts can begin working with the data, you need to ingest the data from the relational databases into Amazon S3 buckets. But you cannot spend the time and resources required to ingest the large amounts of data. You will have to develop numerous mappings and parameter sets to ingest the data to make sure that the data is ingested properly. You also have to make sure that you do not ingest sensitive customer information such as customer credit card numbers. You then have to maintain the mappings when relational schemas change.
Instead of manually creating and running mappings, you can use the Mass Ingestion tool to create one mass ingestion specification that ingests all of your data at once. You have to specify only the source, the target, and any parameters to apply across source tables. When you deploy and run the specification, the Spark engine ingests all of the data to Amazon S3.
The following image shows how mass ingestion can branch the link between the data that the bank stores in its relational databases and the Amazon S3 buckets:
This image shows mass ingestion in between the corporate data center and the Amazon S3 input buckets to show that mass ingestion can be used to ingest data from the corporate data center to the S3 buckets.
Mass ingestion saved you a lot of time and resources and your data analysts have more time to analyze the data and develop a new system for the bank's rewards program.

0 COMMENTS

We’d like to hear from you!