Amazon Redshift Connector Best Practices

Amazon Redshift Connector Best Practices

Amazon S3 Upload Optimizations

Amazon S3 Upload Optimizations

When you upload data to Amazon S3, you can optimize the performance.
To optimize the performance, use the following methods:
Compress data before uploading to Amazon S3
When you read data from a source, local staging files are created before you upload the data to Amazon S3. You can use the multiple compression format to compress the data. When you write data to the target, the files are decompressed. Even though compression and decompression process is a CPU-intensive task, you can optimize the performance when you load large amounts of data to Amazon S3.
Encrypt data before uploading to Amazon S3
You can encrypt the data before you upload the data to Amazon S3. You can use either the client-side or server-side encryption to encrypt the data when you write data stored in Amazon S3 to Amazon Redshift target.
Uploads file to Amazon S3 in parallel
You can upload files to Amazon S3 in parallel. The Secure Agent, Data Integration Service, or PowerCenter Integration Service uses the data available in the compute nodes of the Amazon Redshift cluster to determine how to split the files and upload the files to Amazon S3. The number of files staged and uploaded in parallel is a combination of the partitions used in the mapping and the number of slices in the compute nodes.
The Secure Agent, Data Integration Service, or PowerCenter Integration Service processes data in parallel for the partitioned mapping.
Parallelizing Copy to Amazon Redshift
When you write data from Amazon S3 to Amazon Redshift target, you can split up the data into multiple slices that are equal to or greater than the number of slices in the cluster. Then, load the data in parallel with each slices taking the data from its own dedicated file.
The following image shows how the data are uploaded to Amazon S3 and written to Amazon Redshift target:

0 COMMENTS

We’d like to hear from you!