Performing a join in a database is faster than performing a join in the session. In some cases, this is not possible, such as joining tables from two different databases or flat file systems. If you want to perform a join in a database, use the following options:
Create a pre-session stored procedure to join the tables in a database.
Use the Source Qualifier transformation to perform the join.
Join sorted data when possible.
You can improve session performance by configuring the Joiner transformation to use sorted input. When you configure the Joiner transformation to use sorted data, the Integration Service improves performance by minimizing disk input and output. You see the greatest performance improvement when you work with large data sets.
For an unsorted Joiner transformation, designate the source with fewer rows as the master source.
For optimal performance and disk storage, designate the source with the fewer rows as the master source. During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master source.
For optimal performance and disk storage, designate the source with fewer duplicate key values as the master source. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.