Table of Contents

Search

  1. Preface
  2. Using Metadata Manager
  3. Configuring Metadata Manager
  4. Application Resources
  5. Business Glossary Resources
  6. Business Intelligence Resources
  7. Custom Resources
  8. Data Integration Resources
  9. Data Modeling Resources
  10. Database Management Resources
  11. Universal Resources
  12. Managing Resources
  13. Loading and Monitoring Resources
  14. Managing Permissions
  15. Resource Migration
  16. Repository Migration
  17. Appendix A: Metadata Manager Login
  18. Appendix B: Metadata Manager Properties Files
  19. Appendix C: Resource Configuration Files
  20. Appendix D: Glossary

Metadata Manager Administrator Guide

Metadata Manager Administrator Guide

Linking for Cloudera Entities

Linking for Cloudera Entities

Metadata Manager can display data lineage links between entities in a Cloudera Navigator resource and data objects in other resources. To link Hive tables with data objects in other packaged resources or in universal resources, use connection assignments. To link HDFS files with data objects in other packaged resources, in custom resources, or in universal resources, use a linking rules file.
Metadata Manager links Hive tables in a Cloudera Navigator resource with data objects in another resource when you configure connection assignments between the metadata sources. For example, a PowerCenter session loads data to a Hive target table that exists in your Cloudera distribution. Before you can view data lineage between the PowerCenter target and the Hive table, you must configure a connection assignment between the PowerCenter repository and the Cloudera distribution.
Metadata Manager does not use connection assignments to link HDFS files in a Cloudera Navigator resource with data objects in other resources. To link HDFS files with data objects in other resources, use a linking rules file.
For example, your Cloudera distribution contains the HDFS file big-customer.csv. You use the data in this file to populate the CUST flat file data object in the Developer tool. The CUST flat file data object is used as a source in an HDFS mapping.
Create a linking rules file to link the big-customer.csv HDFS file to the CUST flat file data object. Add a link condition that creates links from the HDFS file to the columns in the output group of the flat file data object.
In Metadata Manager, the CUST flat file data object belongs to the HDFS Data Object class. The output group belongs to the Data Object Read class. The columns in the output group belong to the Attribute class.
Use the following file to create the links:
<?xml version="1.0" encoding="UTF-8"?> <ruleSet name="Link HDFS files to Informatica Platform FF Data Objects"> <sourceResource name="Cloudera01"/> <targetResource name="InfaPlatform01"/> <rule name="Link HDFS big-customer.csv to Informatica Platform CUST FF columns" direction="SourceToTarget"> <sourceFilter > <element class="HDFS File"/> </sourceFilter> <targetFilter> <!-- We must link to features. If we link to structures only, Metadata Manager will not find upstream links to the HDFS file. --> <element class="HDFS Data Object"> <element class="Data Object Read"> <element class="Attribute"/> </element> </element> </targetFilter> <link condition="source.Name = 'big-customer.csv' AND target.parent.Name = 'output' AND target.parent.parent.Name = 'CUST'"/> </rule> </ruleSet>
In this example, the
target.parent.Name = 'output'
clause in the link condition identifies the output group. The
target.parent.parent.Name = 'CUST'
clause identifies the flat file data object.
To upload the linking rules file, edit the Cloudera Navigator resource. After you upload the linking rules file and reload the resource, Metadata Manager creates the lineage links. Metadata Manager creates a link from the big-customer.csv HDFS file to each column in the output group of the CUST flat file data object.
For more information about rule-based links, see the
Metadata Manager Custom Metadata Integration Guide
.

0 COMMENTS

We’d like to hear from you!