Connecting to an HDFS Cluster from Informatica Vibe Data Stream for Machine Data 2.1.0

Connecting to an HDFS Cluster from Informatica Vibe Data Stream for Machine Data 2.1.0

Adding the Source Service and Target Service to the Data Flow

Adding the Source Service and Target Service to the Data Flow

After you create the data flow, add the source service and target service to the data flow.
  1. In the
    Data Flows
    pane, click the data flow to which you want to add a source.
  2. From the
    Entity Types
    pane, drag the
    File
    source service to the
    Data Flow Designer
    pane.
    The
    New Source
    dialog box appears.
  3. Specify the properties of the source service and click
    OK
    .
  4. From the
    Entity Types
    pane, drag the
    HDFS
    target service to the
    Data Flow Designer
    pane.
    The
    New Target
    dialog box appears.
  5. Configure the following properties for the HDFS target service type:
    Property
    Description
    Entity Name
    Name of the HDFS target service. Maximum length is 32 characters.
    Destination
    URI of the target file to which to write data.
    The HDFS target service type supports the following URI formats:
    HDFS URI format
    hdfs://<namenode-name>[:<port>]/<path>/<file-name>
    Where
    • namenode-name
      is the host name or IP address of the HDFS NameNode.
    • port
      is the port number on which the HDFS NameNode listens for connections. You can omit the port number if you have configured HDFS to listen for connections on the default port, 8020.
    • path
      and
      file-name
      represent the location of the target file in the target file system.
      The URI format is suitable for a standalone HDFS target service. The URI is also suitable for an HDFS target service that runs on a node that is not part of a high-availability setup. To use multiple target service instances for load balancing or high availability, use variables in the URI.
    The destination URI must be the same URI that you used to verify the connection to the HDFS cluster.
    Rollover Count
    Limit for the number of files that can exist on the target at any particular time. Default is 1024.
    Rollover Size
    Target file size, in gigabytes (GB), at which to trigger rollover. Default is 1.
    A value of zero (0) means that the HDFS target service does not perform rollover based on size.
    Rollover Time
    Length of time, in hours, to keep a target file active. After the time period has elapsed, the target service rolls the file over. Default is 0.
    A value of zero (0) means that the HDFS target service does not perform rollover based on time.
    Force Synchronization
    Flush the client's buffer to the disk device every 1 second. If you enable forceful synchronization, data written by the target service is visible to other readers immediately. Forceful synchronization degrades the performance of
    VDS
    . For more information about forceful synchronization, see the Hadoop documentation.
    Default behavior is to not synchronize forcefully.
    UM XML Configuration
    Specify the UM configurations that the target service uses.
    Maximum length is 1000 characters.

0 COMMENTS

We’d like to hear from you!