Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon S3
  3. PowerExchange for Amazon S3 Configuration Overview
  4. Amazon S3 Connections
  5. PowerExchange for Amazon S3 Data Objects
  6. PowerExchange for Amazon S3 Mappings
  7. PowerExchange for Amazon S3 Lookups
  8. Appendix A: Amazon S3 Data Type Reference
  9. Appendix B: Troubleshooting

PowerExchange for Amazon S3 User Guide

PowerExchange for Amazon S3 User Guide

Amazon S3 Data Object Write Operation Properties

Amazon S3 Data Object Write Operation Properties

Amazon S3 data object write operation properties include run-time properties that apply to the Amazon S3 data object.
The Developer tool displays advanced properties for the Amazon S3 data object operation in the
Advanced
view.
By default, the Data Integration Service uploads the Amazon S3 file in multiple parts.
The following table describes the Advanced properties for an Amazon S3 data object write operation:
Property
Description
Overwrite File(s) If Exists
Overwrite the existing files.
Select the check box if you want to overwrite the existing files. Default is true.
For more information Overwrite File(s) If Exists, see Overwriting Existing Files.
Folder Path
Bucket name or folder path of the Amazon S3 source file that you want to overwrite.
If applicable, include the folder name that contains the target file in the
<bucket_name>/<folder_name>
format.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the
/<folder_name>
format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
/<dir2>
folder path in this property, the folder path appends with the folder path that you specified in the connection properties in
<my_bucket1>/<dir1>/<dir2>
format.
If you specify the
<my_bucket1>/<dir1>
folder path in the connection property and
<my_bucket2>/<dir2>
folder path in this property, the Data Integration Service writes the file in the
<my_bucket2>/<dir2>
folder path that you specify in this property.
File Name
Name of the Amazon S3 source file that you want to overwrite.
When you run a mapping on the Blaze engine to write data to a target, do not use a semi-colon in file name to run the mapping successfully.
Encryption Type
Method you want to use to encrypt data.
Select one of the following values:
  • None
  • Client Side Encryption
  • Server Side Encryption
  • Server Side Encryption with KMS
For more information, see Amazon S3 Data Encryption.
Staging Directory
Amazon S3 staging directory.
Ensure that the user has write permissions on the directory. In addition, ensure that there is sufficient space to enable staging of the entire file.
Default staging directory is the
/temp
directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
File Merge
Merges the target files into a single file.
Applicable when you run a mapping on the Blaze engine.
Hadoop Performance Tuning Options
Provide semicolon separated name-value attribute pairs to optimize performance when you copy large volumes of data between Amazon S3 and HDFS. Applicable to the Amazon EMR cluster.
Applicable when you run a mapping in the native environment.
For more information about Hadoop performance tuning options, see Hadoop Performance Tuning Options for EMR Distribution.
Compression Format
Compresses data when you write data to Amazon S3.
You can compress the data in the following formats:
  • None
  • Bzip2
  • Deflate
  • Gzip
  • Lzo
  • Snappy
  • Zlib
Default is None.
You can write files that use the deflate, Gzip, snappy, Lzo, and zlib compression formats in the native environment or on the Spark and Databricks Spark engine..
You can write files that use the Bzip2 compression format on the Spark engine.
For more information about compression formats, see Data Compression in Amazon S3 Sources and Targets.
Object Tags
Add single or multiple tags to the objects stored on the Amazon S3 bucket.
You can either enter the key value pairs or specify the file path that contains the key value pairs.
Applicable when you run a mapping in the native environment or on the Spark and Databricks Spark engine to write a flat file to the target.
For more information about the object tags, see Object Tag.
TransferManager Thread Pool Size
The number of threads to write data in parallel. Default is 10.
PowerExchange for Amazon S3 uses the AWS TransferManager API to upload a large object in multiple parts to Amazon S3.
When the file size is more than 5 MB, you can configure multipart upload to upload object in multiple parts in parallel. If you set the value of the
TransferManager Thread Pool Size
to greater than 50, the value reverts to 50.
Applicable when you run a mapping in the native environment to write a flat file to the target.
Part Size
The part size in bytes to upload an Amazon S3 object. Default is 5 MB.
Applicable when you run a mapping in the native environment to write a flat file to the target.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Stream Rollover Size in GB
Applicable to the streaming mappings.
Stream Rollover Time in hours
Applicable to the streaming mappings.
Interim Directory
Applicable to the streaming mappings.
Partition Option
Select one of the following partition options when you configure a dynamic mapping:
  • None. Partitioning is not configured.
  • Last N Columns Partitioned. The last N columns are selected for partitioning.
  • Partition Column Names. Comma-separated column names are selected for partitioning.
Partition Arguments
The number or names of partition columns.
If you selected
None
as the partition option, do not specify a partition argument.
If you selected
Last N Columns Partitioned
as the partition option, specify an integer value as the partition argument.
If you selected
Partition Column Names
as the partition option, specify comma-separated column names as the partition argument.

0 COMMENTS

We’d like to hear from you!