PowerExchange for Amazon S3 User Guide

10.5
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.5
- 10.5.4
- 10.5.3
- 10.5.2
- 10.5.1
- 10.4.1
- 10.4.0

Back Next

Amazon S3 Data Object Read Operation Properties

Amazon S3 data object read operation properties include run-time properties that apply to the Amazon S3 data object.

The Developer tool displays advanced properties for the Amazon S3 data object operation in the

Advanced

view. The following table describes the advanced properties for an Amazon S3 data object read operation:

Property	Description
Source Type	Select the type of source from which you want to read data. You can select the following source types: File Directory Default is File . Applicable when you run a mapping in the native environment or on the Spark and Databricks Spark engine. For more information about source type, see Directory Source in Amazon S3 Sources.
Folder Path	Bucket name or folder path of the Amazon S3 source file that you want to overwrite. If applicable, include the folder name that contains the source file in the <bucket_name>/<folder_name> format. If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties. For example, if you specify the <my_bucket1>/<dir1> folder path in the connection property and /<dir2> folder path in this property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format. If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Data Integration Service reads the file from the <my_bucket2>/<dir2> folder path that you specify in this property.
File Name	Name of the Amazon S3 source file that you want to overwrite.
Allow Wildcard Characters	Indicates whether you want to use wildcard characters for the source file name. When you run a mapping in the native environment to read a flat file and select this option, you can use the * wildcard character for the source file name. When you run a mapping in the native environment or on the Spark and Databricks Spark engine to read an Avro, JSON, ORC, or Parquet file and select this option, you can use the ? and * wildcard characters for the source file name. The question mark character (?) allows one occurrence of any character. The asterisk character (*) allows zero or more than one occurrence of any character.
Staging Directory	Amazon S3 staging directory. Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file. Default staging directory is the temporary directory on the machine that hosts the Data Integration Service. Applicable when you run a mapping in the native environment.
Hadoop Performance Tuning Options	Provide semicolon separated name-value attribute pairs to optimize performance when you copy large volumes of data between Amazon S3 and HDFS. Applicable to the Amazon EMR cluster. For more information about Hadoop performance tuning options, see Hadoop Performance Tuning Options for EMR Distribution.
Compression Format	Decompresses data when you read data from Amazon S3. You can decompress the data in the following formats: None . Select None to decompress files with the deflate, snappy, and zlib formats. Bzip2 Gzip Lzo Default is None. You can read files that use the deflate, snappy, zlib, Gzip, and Lzo compression formats in the native environment or on the Spark and Databricks Spark engine. You can read files that use the Bzip2 compression format on the Spark engine. For more information about compression formats, see Data Compression in Amazon S3 Sources and Targets.
Download Part Size	Downloads an Amazon S3 object in multiple parts. Default is 5 MB. When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel. By default, the Data Integration Service downloads the file in multiple parts. Applicable when you run a mapping in the native environment.
Multiple Download Threshold	Minimum threshold size to download an Amazon S3 object in multiple parts. Default is 10 MB. To download the object in multiple parts in parallel, you must ensure that the file size of an Amazon S3 object is greater than the value you specify in this property. Applicable when you run a mapping in the native environment.
Temporary Credential Duration	The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds. Default is 900 seconds. If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.

Rename Saved Search

Table of Contents

PowerExchange for Amazon S3 User Guide

PowerExchange for Amazon S3 User Guide

Amazon S3 Data Object Read Operation Properties

Amazon S3 Data Object Read Operation Properties