Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Lookup Transformation on the Spark Engine

Lookup Transformation on the Spark Engine

Some processing rules for the Spark engine differ from the processing rules for the Data Integration Service.

Multiple Matches

The Lookup transformation finds values based on the condition you configure in the transformation. Choose how to handle multiple matches in the lookup source.
Return First Row; Return Last Row
The transformation returns the first matching result or the last matching result.
The Data Integration Service orders results to identify the first and last row. The following rules determine result order:
  • Ordering depends on the lookup ports present in the lookup condition and lookup output ports.
  • Numerical value sorting is by ascending order.
  • String value sorting is lexographical.
  • Date value sorting is by earliest date first.
  • When mappings contain a non-equijoin(<=,>=,<,>,!=) comparison and some rows contain NULL values, results might be different depending on the run-time engine:
Return Any Row
The transformation returns any of the rows that match the lookup condition. The transformation creates an index based on the key ports instead of all Lookup transformation ports. When you choose this option, performance can improve because the process of indexing rows is simpler.
Return All Rows
The Lookup transformation returns all rows that match the lookup condition.
Report Error
When the Lookup transformation uses a static cache or no cache, the Data Integration Service marks the row as an error. The Lookup transformation writes the row to the session log by default, and increases the error count by one.
When the Lookup transformation has a dynamic cache, the Data Integration Service fails the session when it encounters multiple matches. The session fails while the Data Integration Service is caching the lookup table or looking up the duplicate key values.
Also, if you configure the Lookup transformation to output old values on updates, the Lookup transformation returns an error when it encounters multiple matches. The transformation creates an index based on the key ports instead of all Lookup transformation ports.

Rules and Guidelines

Mapping validation fails in the following situations:
  • Case sensitivity is disabled.
  • The lookup condition contains binary data type.
  • The lookup condition uses a field with a complex data type.
  • The cache is configured to be shared, named, persistent, dynamic, or uncached. The cache must be a static cache.
The mapping fails in the following situation:
  • The transformation is unconnected and used with a Joiner or Java transformation.
If an HBase lookup does not result in a match, it generates a row with null values for all columns. You can add a Filter transformation after the Lookup transformation to filter out null rows.

0 COMMENTS

We’d like to hear from you!