Connecting to an HDFS Cluster from Informatica Vibe Data Stream for Machine Data 2.1.0

Connecting to an HDFS Cluster from Informatica Vibe Data Stream for Machine Data 2.1.0

Overview

Overview

Use Informatica Vibe Data Stream for Machine Data (VDS) to collect data from different types of sources, such as event logs and real-time logs, and to write this data to the Hadoop Distributed File System (HDFS) on a Hadoop distribution. To write data to a HDFS or Hadoop cluster, use the built-in HDFS target service when you create and deploy a data flow in VDS.
To connect to a Hadoop cluster, you install the client libraries of the HDFS distribution that the Hadoop cluster uses. Install these libraries on the machine on which the VDS Node with the HDFS target service runs. Because VDS does not include HDFS client libraries, you must download and install these libraries.
To connect to a Hadoop cluster and write data, perform the following tasks:
  1. Download and install the HDFS client libraries that the Hadoop cluster uses.
  2. Verify the connection to the Hadoop cluster from the data target machine.
  3. Configure the HADOOPBASEDIR environment variable.
  4. Create a data flow with the HDFS target service and deploy the data flow.
Sample Deployment
The following image shows a sample deployment:
The image shows a deployment in which Vibe Data Stream writes data to a Hadoop cluster.
In the sample deployment, a source service runs on a VDS Node that runs on the data source machine. The HDFS target service runs on a VDS Node that runs on a data target machine. The source service reads data from the source and publishes the data. The HDFS target service reads the data that the source service publishes and writes the data to the Hadoop cluster. The data target machine is a client machine that is outside the Hadoop cluster. The Hadoop cluster includes multiple machines that comprise the distributed file system. The Hadoop cluster includes the primary NameNode and the DataNodes. The NameNode manages the file system metadata and DataNodes store the data in an HDFS cluster.

0 COMMENTS

We’d like to hear from you!