Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Edge Data Streaming
  3. Licenses
  4. Using Informatica Administrator
  5. Creating and Managing the Edge Data Streaming Service
  6. Edge Data Streaming Entity Types
  7. Edge Data Streaming Nodes
  8. Data Connections
  9. Working With Data Flows
  10. Managing the Edge Data Streaming Components
  11. Security
  12. High Availability
  13. Disaster Recovery
  14. Monitoring Edge Data Streaming Entities
  15. Appendix A: Troubleshooting
  16. Appendix B: Frequently Asked Questions
  17. Appendix C: Regular Expressions
  18. Appendix D: Command Line Program
  19. Appendix E: Configuring Edge Data Streaming to Work With a ZooKeeper Observer
  20. Appendix F: Glossary

User Guide

User Guide

Edge Data Streaming Data Flow Process

Edge Data Streaming
Data Flow Process

You use the
Administrator tool
to design the flow of data from the data source to the data target and to deploy the data flow.
The
Administrator Daemon
pushes the data flow configuration information to Apache ZooKeeper. The
EDS Node
s download the configuration information and start the source services and target services that the configuration specifies. Source services read data in blocks and publish messages through a data connection. Target services receive the data and write the data to a data target. The EDS Node monitors the entities in the data flow and sends information about state and statistics to the
Administrator Daemon
. The
Administrator Daemon
sends this information to the
Administrator tool
.
For example, an application writes log data to log files in the following directory:
/usr/app/logs/
. You want to transfer the data contained in the log files to an HDFS cluster. To transfer the data, install
EDS Node
s on the application host machine and target host machine. As part of performing post-installation tasks, start a
EDS Node
Node1 on the application host and a
EDS Node
Node2 on the target host.
The following image shows how
EDS
works:
The image shows the sequence of operations in Edge Data Streaming.
The image numbers the operations in the order of occurrence. The following steps describe the sequence of operations:
  1. Use the
    Administrator tool
    to create and deploy a data flow. When you configure the data connection in the data flow, use the Ultra Messaging or a WebSockets data connection. In the data flow, create a source service. Specify the source directory as
    /usr/app/logs/
    , and map the service to Node1. Create an HDFS target service and map the target service to Node2. Connect the source service to the target service, and add any transformations that you want to apply to the data. Finally, deploy the data flow. The
    Administrator Daemon
    sends the data flow configuration information to ZooKeeper.
  2. The EDS Nodes download data flow configuration information from ZooKeeper. The
    EDS Node
    Node1 starts a source service. Similarly, Node2 starts a target service.
  3. The source service reads data from the source files and publishes that data as messages on a topic.
    EDS
    applies the transformations that you added to the data flow. The target service subscribes to the topic, receives the data, and writes it to the HDFS cluster.
  4. The EDS Node sends information about state and statistics to the
    Administrator Daemon
    . The
    Administrator Daemon
    publishes the information through the Edge Data Streaming Service. You can view the information on the
    Monitoring
    tab in the
    Administrator tool
    .

0 COMMENTS

We’d like to hear from you!