Amazon Redshift Connector Best Practices

Amazon Redshift Connector Best Practices

Distribution Keys

Distribution Keys

In an Amazon Redshift table, every column is an index for each column. Even though, there are no index keys to be specified, you can specify distribution key for partitioning.
Amazon Redshift contains a distributed data architecture where data are vertically partitioned across the Amazon Redshift compute nodes and each compute node in the cluster stores a subset of the data.
You can specify a column as a distribution key by considering the common joins and aggregation keys against a set of table that have data located in the same compute node. Specifying a distribution key in this manner minimizes the data movement over the network. Informatica recommends that you use this option for a very large dimensional table to distribute both the dimension and tables associated with it on the join column.
In addition to the distribution key, when you create a target table, you can select the following options to specify the distribution type to distribute the data:
Even distribution
Even distribution distributes the data uniformly across compute nodes in a round-robin style. Informatica recommends that you use this option for the tables that are not joined with the other tables or the tables that are only joined to tables in which the
ALL
distribution option is specified.
For example, a table that is joined with a smaller dimensional table in which the
ALL
distribution option is specified for the small dimensional table.
ALL
You can specify the
ALL
distribution option to copy the table to all the nodes. Informatica recommends that you use this option for a smaller dimensional tables that needs to be joined frequently with a large distributed tables. For example, a date dimension with a few thousand entries.

0 COMMENTS

We’d like to hear from you!