Table of Contents

Search

  1. Preface
  2. Introduction to PowerExchange for Amazon Redshift
  3. PowerExchange for Amazon Redshift Configuration
  4. Amazon Redshift Connections
  5. PowerExchange for Amazon Redshift Data Objects
  6. Amazon Redshift Mappings
  7. Pushdown Optimization
  8. Amazon Redshift Lookup
  9. Appendix A: Amazon Redshift Datatype Reference
  10. Appendix B: Troubleshooting

PowerExchange for Amazon Redshift User Guide

PowerExchange for Amazon Redshift User Guide

Amazon Redshift Data Object Read Operation Properties

Amazon Redshift Data Object Read Operation Properties

The Data Integration Service reads data from Amazon Redshift based on the data object read operation.
The Developer tool displays advanced properties for the Amazon Redshift data object operation in the Advanced view.
The following table describes the Advanced properties for an Amazon Redshift data object read operation:
Property
Description
S3 Bucket Name
Amazon S3 bucket name for staging the data.
You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the
REGION
attribute in the Unload command options.
The S3 bucket name is not mentioned in the logs of the data preview table in Redshift.
Enable Compression
Compresses the staging files into the Amazon S3 staging directory.
The mapping performance improves when the Data Integration Service compresses the staging files.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Staging Directory Location
Amazon Redshift staging directory.
Specify a directory on the machine that hosts the Data Integration Service.
Applicable when you run a mapping in the native environment.
Unload Options
Unload command options.
Add options to the Unload command to write data from an Amazon Redshift object to an S3 bucket. You can add the following options:
  • DELIMITER
  • REGION
  • PARALLEL
  • NULL
  • AWS_IAM_ROLE
  • ADDQUOTES
Specify a directory on the machine that hosts the Data Integration Service.
Do not use
PARALLEL OFF
if you run a mapping on the Blaze engine.
Treat NULL Value as NULL
Retains the null values when you read data from Amazon Redshift.
Applicable when you run a mapping in the native environment.
Encryption Type
Method you want to use to encrypt data.
Select one of the following values:
  • None
  • SSE-S3
  • SSE-KMS
  • CSE-SMK
For more information, see Data Encryption in Amazon Redshift Sources.
Download S3 Files in Multiple Parts
Downloads large Amazon S3 objects in multiple parts.
When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel.
Applicable when you run a mapping in the native environment.
Multipart Download Threshold Size
Maximum size of an Amazon S3 object in bytes.
When you download large Amazon S3 objects, the large objects are broken into multiple parts. Default is 5 MB.
Applicable when you run a mapping in the native environment.
Pre-SQL
The pre-SQL commands to run a query before you read data from Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Post-SQL
The post-SQL commands to run a query after you read data to Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
Select Distinct
Selects unique values.
The Data Integration Service includes a
SELECT DISTINCT
statement in the query if you choose this option. Amazon Redshift ignores trailing spaces. Therefore, the Data Integration Service might extract fewer rows than expected.
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.
SQL Query
Overrides the default SQL query.
Enclose column names in double quotes. The SQL query is case sensitive. Specify an SQL statement supported by the Amazon Redshift database.
When you specify the columns in the SQL query, ensure that the column name matches the source column name that are mapped in the mapping.
When you use the date and timestamp data types, you need to specify the format
YYYY-MM-DD HH24:MI:SS.US
to take care of the data conversion.
For example, for a Redshift table
sample
that contains the fields
d_date
of the date data type and
t_timestamp
of the timestamp data type, specify the SQL query in the following format:
select (to_char(cast ( sample.d_date as timestamp ),'YYYY-MM-DD HH24:MI:SS.US')), (to_char(sample.t_timestamp,'YYYY-MM-DD HH24:MI:SS.US')) FROM sample;
Applicable when you run a mapping in the native environment, on the Spark engine, or on the Databricks Spark engine.

0 COMMENTS

We’d like to hear from you!