Performance Tuning and Sizing Guidelines for PowerExchange for Amazon Redshift on the Spark Engine

Performance Tuning and Sizing Guidelines for PowerExchange for Amazon Redshift on the Spark Engine

Performance Sizing Recommendations

Performance Sizing Recommendations

You can tune the hardware parameters based on the data set and using the recommended hardware capacities to optimize the performance of the machine where the Data Integration Service runs.
Performance sizing recommendations depend on the size of the data set. Consider the sizing recommendations for the following types of data set:
Sandbox data set
A data set is considered to be a sandbox data set if you read from or write to less than 1 GB of data.
For a sandbox data set, you can read 1 GB of data from an Amazon Redshift source to an HDFS database and write data to an Amazon Redshift target.
The following table describes the sizing recommendations for the system that processes sandbox data sets:
Requirement
Sizing Recommendation
Hadoop Services Recommendation
VCPUs
2
4 * 1 = 4
Memory
8 GB
2 * 1 = 2 GB
Storage
14 GB
1 * 2 = 2 GB
Small data set
A data set is considered to be a small data set if you read from or write to less than 100 GB of data.
For a small data set, you can read 100 GB of data from an Amazon Redshift source to an HDFS database in
5:00:00(HH:MM:SS)
hours and write data to an Amazon Redshift target in
5:00:00(HH:MM:SS)
hours.
The following table describes the sizing recommendations for the system that processes small data sets:
Requirement
Sizing Recommendation
Hadoop Services Recommendation
VCPUs
2
5 * 1 = 5
Memory
8 GB
6 * 1 = 6 GB
Storage
14 GB
100 * 3 = 300 GB
Medium data set
A data set is considered to be a medium data set if you read from or write to less than 1 TB of data.
For a medium data set, you can read 1 TB of data from an Amazon Redshift source to an HDFS database in
5:00:00(HH:MM:SS)
hours and write data to an Amazon Redshift target in
5:00:00(HH:MM:SS)
hours.
The following table describes the sizing recommendations for the system that processes medium data sets:
Requirement
Sizing Recommendation
Hadoop Services Recommendation
VCPUs
2
5 * 6 = 30
Memory
8 GB
6 * 6 = 36 GB
Storage
14 GB
1 * 3 = 3 TB
Large data set
A data set is considered to be a large data set if you read from or write to less than 10 TB of data.
For a medium data set, you can read 10 TB of data from an Amazon Redshift source to an HDFS database in
5:00:00(HH:MM:SS)
hours and write data to an Amazon Redshift target in
5:00:00(HH:MM:SS)
hours.
The following table describes the sizing recommendations for the system that processes large data sets:
Requirement
Sizing Recommendation
Hadoop Services Recommendation
VCPUs
2
5 * 60 = 300
Memory
8 GB
6 * 60 = 360 GB
Storage
14 GB
10 * 3 = 30 TB
The following image represents the hardware requirements along with the change in data set and change in individual component settings:

0 COMMENTS

We’d like to hear from you!