to design the flow of data from the data source to the data target and to deploy the data flow.
The
Administrator Daemon
pushes the data flow configuration information to Apache ZooKeeper. The
EDS Node
s download the configuration information and start the source services and target services that the configuration specifies. Source services read data in blocks and publish messages through a data connection. Target services receive the data and write the data to a data target. The EDS Node monitors the entities in the data flow and sends information about state and statistics to the
Administrator Daemon
. The
Administrator Daemon
sends this information to the
Administrator tool
.
For example, an application writes log data to log files in the following directory:
/usr/app/logs/
. You want to transfer the data contained in the log files to an HDFS cluster. To transfer the data, install
EDS Node
s on the application host machine and target host machine. As part of performing post-installation tasks, start a
EDS Node
Node1 on the application host and a
EDS Node
Node2 on the target host.
The following image shows how
EDS
works:
The image numbers the operations in the order of occurrence. The following steps describe the sequence of operations:
Use the
Administrator tool
to create and deploy a data flow. When you configure the data connection in the data flow, use the Ultra Messaging or a WebSockets data connection. In the data flow, create a source service. Specify the source directory as
/usr/app/logs/
, and map the service to Node1. Create an HDFS target service and map the target service to Node2. Connect the source service to the target service, and add any transformations that you want to apply to the data. Finally, deploy the data flow. The
Administrator Daemon
sends the data flow configuration information to ZooKeeper.
The EDS Nodes download data flow configuration information from ZooKeeper. The
EDS Node
Node1 starts a source service. Similarly, Node2 starts a target service.
The source service reads data from the source files and publishes that data as messages on a topic.
EDS
applies the transformations that you added to the data flow. The target service subscribes to the topic, receives the data, and writes it to the HDFS cluster.
The EDS Node sends information about state and statistics to the
Administrator Daemon
. The
Administrator Daemon
publishes the information through the Edge Data Streaming Service. You can view the information on the