Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Streaming
  3. Big Data Streaming Configuration
  4. Sources in a Streaming Mapping
  5. Targets in a Streaming Mapping
  6. Streaming Mappings
  7. Window Transformation
  8. Appendix A: Connections
  9. Appendix B: Data Type Reference
  10. Appendix C: Sample Files

Big Data Streaming User Guide

Big Data Streaming User Guide

Big Data Streaming Overview

Big Data Streaming Overview

Use Informatica Big Data Streaming to prepare and process streams of data in real time and uncover insights in time to meet your business needs. Big Data Streaming provides pre-built connectors such as Kafka, Amazon Kinesis, HDFS, enterprise messaging systems, and data transformations to enable a code-free method of defining data integration logic.
Big Data Streaming builds on the best of open source technologies. It uses Spark Streaming for stream processing, and supports other open source stream processing platforms and frameworks, such as Kafka and Hadoop.
Create Streaming mappings to collect the streamed data, build the business logic for the data, and push the logic to a Spark engine for processing. The Spark engine uses Spark Streaming to process data. The Spark engine reads the data, divides the data into micro batches, processes it, and publishes it.
You can create streaming mappings to stream machine, device, and social media data in the form of messages. You can stream data from sources such as JMS providers, Apache Kafka brokers, Amazon Kinesis streams, Microsoft Azure Event Hubs, and MapR streams. Use a Messaging connection to access the data as it becomes available.
You can stream the following types of data:
  • Application and infrastructure log data
  • Change data(CDC) from databases
  • Clickstreams from web servers
  • Geo-spatial data from devices
  • Sensor data
  • Time series data
  • Supervisory Control And Data Acquisition (SCADA) data
  • Message bus data
  • Programmable logic controller (PLC) data
  • Point of sale data from devices
You can stream data to different types of targets, such as Kafka, HDFS, Amazon Kinesis Firehose, HBase tables, Hive tables, JDBC-compliant databases, Microsoft Azure Event Hubs, Azure Data Lake Store, MapR-DB, and MapR streams.
Big Data Streaming works with Informatica Big Data Management to provide streaming capabilities. Big Data Streaming uses Spark Streaming to process streamed data. It uses YARN to manage the resources on a Spark cluster more efficiently and uses third-parties distributions to connect to and push job processing to a Hadoop environment.
Use Informatica Developer (the Developer tool) to create streaming mappings. Use the Hadoop run-time environment and the Spark engine to run the mapping. You can configure high availability to run the streaming mappings on the Hadoop cluster.
For more information about running mappings on the Spark engine, see the
Informatica Big Data Management User Guide
.

0 COMMENTS

We’d like to hear from you!