Amazon Redshift Connector Best Practices

Amazon Redshift Connector Best Practices

Analyze and Vacuum Target Table

Analyze and Vacuum Target Table

After you load a large amount of data in the Amazon Redshift tables, you must ensure that the tables are updated without any loss of disk space and all rows are sorted to regenerate the query plan.
The leader node uses the table statistics to generate a query plan. The query plan might not be optimal if the table size changes. When you run an update operation in an Amazon Redshift table, the data are deleted and then inserted. However, the records are not deleted. Instead the records are marked as delete that results in consuming spaces. Similarly, when you run an append operation, the tables are sorted periodically if the append order is not inline with the sort key. The newly inserted, updated, and deleted rows are sorted in a separate unsorted partition.
You must run the ANALYZE command to ensure that the query planner on Amazon Redshift updates the statistical metadata to build and choose optimal plans to improve the efficiency of queries. You can run the ANALYZE command during design time to update the table statistics after running the ETL job on the entire table or to specific columns that are frequently used in sorting and joins.
Additionally, in the
COPY command
property, you can set the value of the
STATUPDATE
option to
ON
to run the ANALYZE command and perform an analysis.
When the table changes or is updated, there might be free spaces and unsorted rows in the table. You must set the
Vacuum Target Table
property to restore free spaces and to sort the rows in a specified table or all tables in the database. This results in a faster query performance.

0 COMMENTS

We’d like to hear from you!