Use Informatica Big Data Streaming to prepare and process streams of data in real time and uncover insights in time to meet your business needs. Big Data Streaming provides pre-built connectors such as Kafka, Amazon Kinesis, HDFS, enterprise messaging systems, and data transformations to enable a code-free method of defining data integration logic.
Big Data Streaming builds on the best of open source technologies. It uses Spark Structured Streaming for stream processing, and supports other open source stream processing platforms and frameworks, such as Kafka and Hadoop. Spark Structured Streaming is a scalable and fault-tolerant open source stream processing engine built on the Spark engine.
You can create streaming mappings to stream machine, device, and social media data in the form of messages. Streaming mappings collect machine, device, and social media data in the form of messages. The mapping builds the business logic for the data and pushes the logic to the Spark engine for processing. Use a Messaging connection to get data from Apache Kafka brokers, Amazon Kinesis, and Azure Event Hubs.
The Spark engine runs the streaming mapping continuously. The Spark engine reads the data, divides the data into micro batches, processes it, updates the results to a result table, and then writes to a target.
You can stream the following types of data:
Application and infrastructure log data
Change data(CDC) from databases
Clickstreams from web servers
Geo-spatial data from devices
Time series data
Supervisory Control And Data Acquisition (SCADA) data
Message bus data
Programmable logic controller (PLC) data
Point of sale data from devices
You can stream data to different types of targets, such as Kafka, HDFS, Amazon Kinesis Firehose, Amazon S3, HBase tables, Hive tables, JDBC-compliant databases, Microsoft Azure Event Hubs, and Azure Data Lake Store.
Big Data Streaming works with Informatica Big Data Management to provide streaming capabilities. Big Data Streaming uses Spark Structured Streaming to process streamed data. It uses YARN to manage the resources on a Spark cluster more efficiently and uses third-parties distributions to connect to and push job processing to a Hadoop environment.
Use Informatica Developer (the Developer tool) to create streaming mappings. Use the Hadoop run-time environment and the Spark engine to run the mapping. You can configure high availability to run the streaming mappings on the Hadoop cluster.
For more information about running mappings on the Spark engine, see the