Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon Redshift
  3. PowerExchange for Amazon Redshift Configuration
  4. Amazon Redshift Connections
  5. PowerExchange for Amazon Redshift Data Objects
  6. Amazon Redshift Mappings
  7. Pushdown Optimization
  8. Amazon Redshift Lookup
  9. Appendix A: Amazon Redshift Datatype Reference
  10. Appendix B: Troubleshooting

PowerExchange for Amazon Redshift User Guide

PowerExchange for Amazon Redshift User Guide

Amazon Redshift Data Object Write Operation Properties

Amazon Redshift Data Object Write Operation Properties

Amazon Redshift data object write operation properties include run-time properties that apply to the Amazon Redshift data object.
The Developer tool displays advanced properties for the Amazon Redshift data object operation in the
Advanced
view. The following table describes the Advanced properties for an Amazon Redshift data object write operation:
Property
Description
S3 Bucket Name
Amazon S3 bucket name for staging the data.
You can also specify the bucket name with the folder path. Use an S3 bucket in the same region as your Amazon Redshift cluster.
Redshift data preview does not log the staging data for an S3 bucket.
Enable Compression
Compresses the staging files before writing to Amazon Redshift.
By default, the compression is enabled.
Mapping performance improves when the Data Integration Service compresses the staged files.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Staging Directory Location
Amazon Redshift staging directory.
Specify a directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
Batch Size
Minimum number of rows in a batch. Enter a number greater than 0.
Default is 2000000.
Applicable when you run a mapping in the native environment.
Max Errors per Upload Batch for INSERT
The number of error rows that causes an upload insert batch to fail.
Enter a positive integer. Default is 1.
If the number of errors is equal to or greater than the property value, the Data Integration Service writes the entire batch to the error file.
Truncate Target Table Before Data Load
Deletes all the existing data in the Amazon Redshift target table before loading new data.
Require Null Value For Char And Varchar
Replaces the string value with NULL when uploading data to Amazon Redshift columns of Char and Varchar data types.
Default is an empty string.
When you run a mapping to write null values to a table that contains a single column of the Int, Bigint, numeric, real, or double data type, the mapping fails. You must provide a value other than the default value in the
Require Null Value For Char And Varchar
property.
WaitTime In Seconds For S3 File Consistency
The number of seconds to wait for the Data Integration Service to make the staged files consistent with the list of files available on Amazon S3.
Default is 0.
Applicable when you run a mapping in the native environment or on the Blaze engine.
Copy Options
Copy command options.
Enables you to add additional options to the copy command for writing data from an Amazon S3 source to an Amazon Redshift target when the default delimiter comma (\036) or double-quote (\037) is used in the data.
You can add the following options:
  • DELIMITER
  • ACCEPTINVCHARS
  • QUOTE
  • REGION
  • COMPUPDATE
  • AWS_IAM_ROLE
Specify a directory on the machine that hosts the Data Integration Service.
S3 Server Side Encryption
Indicates that Amazon S3 encrypts data during upload and decrypts data at the time of access.
S3 Client Side Encryption
Indicates that the Data Integration Service encrypts data by using a private encryption key.
If you enable both server side and client side encryption, the Data Integration Service ignores the server side encryption.
Applicable when you run a mapping in the native environment.
Analyze Target Table
Runs an
ANALYZE
command on the target table.
The query planner on Amazon Redshift updates the statistical metadata to build and choose optimal plans to improve the efficiency of queries.
Vacuum Target Table
Recovers disk space and sorts rows in a specified table or all tables in the database.
Default is None.
You can select the following recovery options:
None
Does not sort rows or recover disk space.
Full
Sorts the specified table or all tables in the database and recovers disk space occupied by rows marked for deletion by previous update and delete operations.
Sort Only
Sorts the specified table or all tables in the database without recovering space freed by deleted rows.
Delete Only
Recovers disk space occupied by rows marked for deletion by previous update and delete operations, and compresses the table to free up used space.
Prefix To Retain For Staging Files On S3
Retains staging files on Amazon S3.
Provide both a directory prefix and a file prefix separated by a slash (/) or only a file prefix to retain staging files on Amazon S3. For example,
backup_dir/backup_file
or
backup_file
.
If you run a mapping on the Blaze engine, the file prefix is not retained because the file creation is dynamic on the Blaze engine. You must specify only a directory prefix. For example,
backup_dir
.
Success File Directory
Directory for the Amazon Redshift success file.
Specify a directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
Error File Directory
Directory for the Amazon Redshift error file.
Specify a directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
Treat Source Rows As
Overrides the Amazon Redshift target.
Default is
INSERT
.
Select one of the following options:
None
You can use an Update Strategy transformation to write data to an Amazon Redshift target when you select this option.
You can configure an Update Strategy transformation for an Amazon Redshift target in the native environment.
INSERT
If enabled, the Data Integration Service inserts all rows flagged for insert. If disabled, the Data Integration Service rejects the rows flagged for insert. By default, the insert operation is enabled.
DELETE
If enabled, the Data Integration Service deletes all rows flagged for delete. If disabled, the Data Integration Service rejects all rows flagged for delete.
To perform a delete operation, you must map the primary key column and at least one column other than primary key column.
UPDATE and UPSERT
Performs update and upsert operations. To perform an update operation, you must map the primary key column and at least one column other than primary key column. You can select the following data object operation attributes:
  • Update as Update: The Data Integration Service updates all rows as updates.
  • Update else Insert: The Data Integration Service updates existing rows and inserts other rows as if marked for insert.
Minimum Upload Part Size
Minimum size of the Amazon Redshift object to upload.
Default is 5 MB.
Applicable when you run a mapping in the native environment.
Transfer Manager Thread Pool Size
The number of threads to write data in parallel.
Default is 10.
Applicable when you run a mapping in the native environment.
Pre-SQL
The pre-SQL commands to run a query before you write data from Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Post-SQL
The post-SQL commands to run a query after you write data to Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Schema Name
Overrides the default schema name.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Preserve record order on write
Retains record order when the Data Integration Service writes data from a CDC source to the target.
Use this property when you create a mapping to capture the changed record from a CDC source. This property enables you to avoid inconsistencies between the CDC source and target.
Number of files per batch
Specifies the number of staging files for each batch.
If you do not provide the number of files, the Integration Service calculates the number of staging files.
Applicable when you run a mapping in the native environment.
Target Table Name
Overwrites the default target table name.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Recovery Schema Name
Not applicable.
Target Schema Strategy
Target schema strategy for the Amazon Redshift target table.
You can select one of the following target schema strategies:
  • RETAIN - Retain existing target schema
  • CREATE - Create or replace table at run time
    When you select the
    CREATE
    option, you must provide the value of the
    Schema Name
    property to run the mapping successfully.
  • Assign Parameter
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
For more information about Target Schema Strategy, see Target Schema Strategy

0 COMMENTS

We’d like to hear from you!