Big Data Streaming User Guide

10.2.1
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.4
- 10.5.2
- 10.5.10
- 10.5.1
- 10.5
- 10.4.1
- 10.4.0
- 10.2.2 Service Pack 1
- 10.2.2

Back Next

Big Data Streaming Overview

Use Informatica Big Data Streaming to prepare and process streams of data in real time and uncover insights in time to meet your business needs. Big Data Streaming provides pre-built connectors such as Kafka, Amazon Kinesis, HDFS, enterprise messaging systems, and data transformations to enable a code-free method of defining data integration logic.

Big Data Streaming builds on the best of open source technologies. It uses Spark Streaming for stream processing, and supports other open source stream processing platforms and frameworks, such as Kafka and Hadoop.

Create Streaming mappings to collect the streamed data, build the business logic for the data, and push the logic to a Spark engine for processing. The Spark engine uses Spark Streaming to process data. The Spark engine reads the data, divides the data into micro batches, processes it, and publishes it.

You can create streaming mappings to stream machine, device, and social media data in the form of messages. You can stream data from sources such as JMS providers, Apache Kafka brokers, Amazon Kinesis streams, Microsoft Azure Event Hubs, and MapR streams. Use a Messaging connection to access the data as it becomes available.

You can stream the following types of data:

Application and infrastructure log data

Change data(CDC) from databases

Clickstreams from web servers

Geo-spatial data from devices

Sensor data

Time series data

Supervisory Control And Data Acquisition (SCADA) data

Message bus data

Programmable logic controller (PLC) data

Point of sale data from devices

You can stream data to different types of targets, such as Kafka, HDFS, Amazon Kinesis Firehose, HBase tables, Hive tables, JDBC-compliant databases, Microsoft Azure Event Hubs, Azure Data Lake Store, MapR-DB, and MapR streams.

Big Data Streaming works with Informatica Big Data Management to provide streaming capabilities. Big Data Streaming uses Spark Streaming to process streamed data. It uses YARN to manage the resources on a Spark cluster more efficiently and uses third-parties distributions to connect to and push job processing to a Hadoop environment.

Use Informatica Developer (the Developer tool) to create streaming mappings. Use the Hadoop run-time environment and the Spark engine to run the mapping. You can configure high availability to run the streaming mappings on the Hadoop cluster.

For more information about running mappings on the Spark engine, see the

Informatica Big Data Management User Guide

Introduction to Big Data Streaming

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal