You run the IT department of a major bank that has millions of customers. You want to monitor network activity in real time. You need to collect network activity data from various sources such as firewalls or network devices to improve security and prevent attacks. The network activity data includes Denial of Service (DoS) attacks and failed login attempts made by customers. The network activity data is written to Kafka queues.
Create a Streaming mapping to read the network activity data and write the data to HDFS.
In a Hadoop environment, you can use the following objects in the Streaming mapping:
Kafka data object
The input file is a Kafka queue that contains the network activity data.
Create a Kafka data object. Configure a Kafka connection and specify the queue that contains the network activity data as a resource for the data object. Create a data object read operation and configure the properties. Drag the data object into the mapping as a source data object.
Transformations
Add a Lookup transformation to get data from a particular customer ID. Add a Window transformation to accumulate the streamed data into data groups before processing the data.
HDFS complex file data object
Create a complex file data object. Configure an HDFS connection to write to an HDFS sequence file. Create the data object write operation and configure the properties. Drag the data object into the mapping as a target data object.
Link ports between mapping objects to create a flow of data.
The following image shows the sample mapping:
When you run the mapping, the data is read from the Kafka queue and written to the HDFS sequence file.