When the output data format is hierarchical, you can define join conditions for the data sources. You must configure a join condition if an output group or field has multiple data sources. Configure a join condition to join the data from the input groups or incoming fields.
Configure the join conditions for the output groups on the
Hierarchy Processor
tab.
Click the
Join Conditions
icon for the output group.
Add a join condition.
Select the left data source.
Select the join type:
Inner. Includes rows with matching join conditions. Discards rows that do not match the join conditions.
Left Outer. Includes all rows from the right pipeline and the matching rows from the left pipeline. Discards the unmatched rows from the left pipeline.
Right Outer. Includes all rows from the left pipeline and the matching rows from the right pipeline. Discards the unmatched rows from the right pipeline.
Full Outer. Includes rows with matching join conditions and all incoming data from the left pipeline and right pipeline.
If you select an outer join on a large data set, you might need to increase the Spark driver memory in the mapping task. For more information about Spark session properties, see
Tasks
.
Select the right data source.
Click
Configure Join Condition
.
Select fields and built-in functions to create the expression.