Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon S3
  3. PowerExchange for Amazon S3 Configuration Overview
  4. Amazon S3 Connections
  5. PowerExchange for Amazon S3 Data Objects
  6. PowerExchange for Amazon S3 Mappings
  7. PowerExchange for Amazon S3 Lookups
  8. Appendix A: Amazon S3 Data Type Reference
  9. Appendix B: Troubleshooting

PowerExchange for Amazon S3 User Guide

PowerExchange for Amazon S3 User Guide

Hadoop Performance Tuning Options for EMR Distribution

Hadoop Performance Tuning Options for EMR Distribution

You can use Hadoop Performance Tuning Options to optimize the performance in the Amazon EMR distribution when you copy large volumes of data between Amazon S3 buckets and HDFS.
You must provide semicolon separated name-value attribute pairs for Hadoop Performance Tuning Options.
Use the following parameters for Hadoop Performance Tuning Options:
  • mapreduce.map.java.opts
  • fs.s3a.fast.upload
  • fs.s3a.multipartthreshold
  • fs.s3a.multipartsize
  • mapreduce.map.memory.mb
The following sample shows the recommended values for the parameter:
mapreduce.map.java.opts=-Xmx4096m;fs.s3a.fast.upload=true;fs.s3a.multipart.threshold=33554432;fs.s3a.multipart.size=33554432;mapreduce.map.memory.mb=4096

0 COMMENTS

We’d like to hear from you!