Use a Hadoop Distributed File System (HDFS) target service to write data to HDFS. To create an HDFS target service, use the HDFS target service type. You can configure the target service for target file rollover. You can also perform advanced configurations to avoid data loss in high availability and load balancing deployments.
If HDFS is Kerberos enabled, create the
hdfs
super user principal. Ensure that the Hadoop users have a Kerberos principal or keytab to get the Kerberos credentials that are required to access the cluster and use the Hadoop services.
Before you deploy a data flow that uses HDFS target services, perform the following tasks:
Install the HDFS distribution that you want and set an environment variable that EDS can use to find the client libraries for the HDFS distribution. Verify that you have the client libraries installed in the path where the
EDS Node
is running.
For example, if you have a Cloudera Distribution, download the libraries from the
Cloudera Downloads page.
Set an environment variable on each host on which an HDFS target service runs. The environment variable must point to the Hadoop base directory. The environment variable is of the form
HADOOPBASEDIR=<Hadoop_Home_Directory>
. For example,
HADOOPBASEDIR=/usr/hadoop-2.0.2-alpha
.
The
Retry on Failure
and
Number of Retries
properties are not applicable for the HDFS target service.