Table of Contents

Search

Amazon S3 Data Object Read Operation Properties

Amazon S3 Data Object Read Operation Properties

Amazon S3 data object read operation properties include run-time properties that apply to the Amazon S3 data object.
The Developer tool displays advanced properties for the Amazon S3 data object operation in the Advanced view. The following table describes the advanced properties for an Amazon S3 data object read operation:
Property
Description
Source Type
Select the type of source from which you want to read data. You can select the following source types:
  • File
  • Directory
Default is
File
. Applicable when you run a mapping in the native environment or on the Spark engine.
For more information, see Directory Source in Amazon S3 Sources.
Folder Path
Bucket name that contains the Amazon S3 source file.
If applicable, include the folder name that contains the source file in the
<bucket_name>/<folder_name>
format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the
/<folder_name>
format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
/<dir2>
folder path in this property, the folder path appends with the folder path that you specified in the connection properties in
<my_bucket1>/<dir1>/<dir2>
format.
If you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
<my_bucket2>/<dir2>
folder path in this property, the Data Integration Service reads the file from the
<my_bucket2>/<dir2>
folder path that you specify in this property.
File Name
Name of the Amazon S3 file from which you want to read data.
Download S3 File in Multiple Parts
Download large Amazon S3 objects in multiple parts.
When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel.
Applicable to the Blaze and Spark engine. By default, the Data Integration Service downloads the file in multiple part.
Staging Directory
Amazon S3 staging directory. Applicable to the native environment. Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file.
Default staging directory is the
/temp
directory on the machine that hosts the Data Integration Service.
Hadoop Performance Tuning Options
Applicable to the Amazon EMR cluster. Provide semicolon separated name-value attribute pairs to optimize performance when you copy large volumes of data between Amazon S3 and HDFS .
Compression Format
Decompresses data when you read data from Amazon S3. You can decompress the data in the following formats:
  • None
  • Gzip
  • Bzip2
  • Lzo
Default is None. Applicable when you run a mapping in the native environment or on the Spark engine. The gzip compression format is applicable when you run a mapping in the native environment.
When you read an Avro file, you can decompress the file using the none compression format. When you read a Parquet file, you can decompress the file using the none, gzip, and lzo compression formats. After you decompress the Avro and Parquet files, you can read the files without using any compression format.


Updated July 30, 2020