When you join data from the same source, you can create two branches of the pipeline.
When you branch a pipeline, you must add a transformation between the mapping input and the Joiner transformation in at least one branch of the pipeline. You must join sorted data and configure the Joiner transformation for sorted input.
For example, you have a source with the following ports:
Employee
Department
Total Sales
In the target, you want to view the employees who generated sales that were greater than the average sales for their departments. To do this, you create a mapping with the following transformations:
Sorter transformation. Sorts the data.
Sorted Aggregator transformation. Averages the sales data and group by department. When you perform this aggregation, you lose the data for individual employees. To maintain employee data, you must pass a branch of the pipeline to the Aggregator transformation and pass a branch with the same data to the Joiner transformation to maintain the original data. When you join both branches of the pipeline, you join the aggregated data with the original data.
Sorted Joiner transformation. Joins the sorted aggregated data with the original data.
Filter transformation. Compares the average sales data against sales data for each employee and filter out employees with less than above average sales.
Employees_West Source
Pipeline branch 1
Pipeline Branch 2
Sorted Joiner transformation
Filter out employees with less than above average sales
Joining two branches might decrease performance if the Joiner transformation receives data from one branch much later than the other branch. The Joiner transformation caches all the data from the first branch and writes the cache to disk if the cache fills. The Joiner transformation must then read the data from disk when it receives the data from the second branch.