Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon S3
  3. PowerExchange for Amazon S3 Configuration Overview
  4. Amazon S3 Connections
  5. PowerExchange for Amazon S3 Data Objects
  6. PowerExchange for Amazon S3 Mappings
  7. Appendix A: Amazon S3 Data Type Reference
  8. Appendix B: Troubleshooting

PowerExchange for Amazon S3 User Guide

PowerExchange for Amazon S3 User Guide

Amazon S3 Data Object Read Operation Properties

Amazon S3 Data Object Read Operation Properties

Amazon S3 data object read operation properties include run-time properties that apply to the Amazon S3 data object.
The Developer tool displays advanced properties for the Amazon S3 data object operation in the
Advanced
view. The following table describes the advanced properties for an Amazon S3 data object read operation:
Property
Description
Source Type
Select the type of source from which you want to read data. You can select the following source types:
  • File
  • Directory
Default is
File
. Applicable when you run a mapping in the native environment or on the Spark and Databricks Spark engine.
For more information about source type, see Directory Source in Amazon S3 Sources.
Folder Path
Bucket name or folder path of the Amazon S3 source file that you want to overwrite.
If applicable, include the folder name that contains the source file in the
<bucket_name>/<folder_name>
format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the
/<folder_name>
format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
/<dir2>
folder path in this property, the folder path appends with the folder path that you specified in the connection properties in
<my_bucket1>/<dir1>/<dir2>
format.
If you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
<my_bucket2>/<dir2>
folder path in this property, the Data Integration Service reads the file from the
<my_bucket2>/<dir2>
folder path that you specify in this property.
File Name
Name of the Amazon S3 source file that you want to overwrite.
Allow Wildcard Characters
Indicates whether you want to use wildcard characters for the source file name.
When you run a mapping in the native environment to read a flat file and select this option, you can use the
*
wildcard character for the source file name.
When you run a mapping in the native environment or on the Spark and Databricks Spark engine to read an Avro, JSON, ORC, or Parquet file and select this option, you can use the
?
and
*
wildcard characters for the source file name.
The question mark character (?) allows one occurrence of any character. The asterisk character (*) allows zero or more than one occurrence of any character.
Staging Directory
Amazon S3 staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file.
Default staging directory is the
/temp
directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
Hadoop Performance Tuning Options
Provide semicolon separated name-value attribute pairs to optimize performance when you copy large volumes of data between Amazon S3 and HDFS.
Applicable to the Amazon EMR cluster.
For more information about Hadoop performance tuning options, see Hadoop Performance Tuning Options for EMR Distribution.
Compression Format
Decompresses data when you read data from Amazon S3.
You can decompress the data in the following formats:
  • None
    . Select
    None
    to decompress files with the deflate, snappy, and zlib formats.
  • Bzip2
  • Gzip
  • Lzo
Default is None.
You can read files that use the deflate, snappy, zlib, Gzip, and Lzo compression formats in the native environment or on the Spark and Databricks Spark engine.
You can read files that use the Bzip2 compression format on the Spark engine.
For more information about compression formats, see Data Compression in Amazon S3 Sources and Targets.
Download Part Size
Downloads an Amazon S3 object in multiple parts.
Default is 5 MB.
When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel. By default, the Data Integration Service downloads the file in multiple parts.
Applicable when you run a mapping in the native environment.
Multiple Download Threshold
Minimum threshold size to download an Amazon S3 object in multiple parts.
Default is 10 MB.
To download the object in multiple parts in parallel, you must ensure that the file size of an Amazon S3 object is greater than the value you specify in this property.
Applicable when you run a mapping in the native environment.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.

0 COMMENTS

We’d like to hear from you!