Connections for INFACore

Connections for INFACore

Amazon Redshift

Amazon Redshift

Create an Amazon Redshift connection to connect to Amazon Redshift. .

Feature snapshot

Operation
Support
Read
Yes
Write
Yes

Before you begin

Before you configure the connection properties, you'll need to get information from your AWS account.
The following video shows you how to get information from your AWS account:

Connection properties

The following table describes the Amazon Redshift connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Username
User name of the Amazon Redshift account.
Password
Password for the Amazon Redshift account.
Access Key ID
Access key to access the Amazon S3 staging bucket.
Enter the access key value based on the following authentication methods:
  • Basic authentication. Enter the actual access key value.
  • IAM authentication. Do not enter the access key value.
  • Temporary security credentials using assume role. Enter access key of an IAM user with no permissions to access the Amazon S3 staging bucket.
  • Assume role for EC2. Do not enter the access key value.
Secret Access Key
Secret access key to access the Amazon S3 staging bucket.
The secret key is associated with the access key and uniquely identifies the account.
Enter the access key value based on the following authentication methods:
  • Basic authentication. Enter the actual access secret value.
  • IAM authentication. Do not enter the access secret value.
  • Temporary security credentials using assume role. Enter access secret of an IAM user with no permissions to access Amazon S3 staging bucket.
  • Assume role for EC2. Do not enter the access secret value.
IAM Role ARN
The Amazon Resource Number (ARN) of the IAM role assumed by the user to use the dynamically generated temporary security credentials.
Set the value of this property if you want to use the temporary security credentials to access the Amazon S3 staging bucket.
For more information about how to get the ARN of the IAM role, see the AWS documentation.
External Id
The external ID for a more secure access to the Amazon S3 bucket when the Amazon S3 staging bucket is in a different AWS account.
Use EC2 Role to Assume Role
Optional. Select the check box to enable the EC2 role to assume another IAM role specified in the IAM Role ARN option.
The EC2 role must have a policy attached with a permission to assume an IAM role from the same or different account.
Master Symmetric Key
A 256-bit AES encryption key in the Base64 format when you enable client-side encryption. You can generate a key using a third-party tool.
JDBC URL
The URL of the Amazon Redshift connection.
Enter the JDBC URL in the following format:
jdbc:redshift://<amazon_redshift_host>:<port_number>/<database_name>
Cluster Region
The AWS cluster region in which the bucket you want to access resides.
Select a cluster region if you choose to provide a custom JDBC URL that does not contain a cluster region name in the
JDBC URL
connection property.
If you select a cluster region in both
Cluster Region
and
JDBC URL
connection properties, the agent ignores the cluster region that you specify in the
JDBC URL
connection property.
To use the cluster region name that you specify in the
JDBC URL
connection property, select
None
as the cluster region in this property.
You can only read data from or write data to the cluster regions supported by AWS SDK.
Select one of the following cluster regions:

    None

    Asia Pacific(Mumbai)

    Asia Pacific(Seoul)

    Asia Pacific(Singapore)

    Asia Pacific(Sydney)

    Asia Pacific(Tokyo)

    Asia Pacific(Hong Kong)

    AWS GovCloud (US)

    AWS GovCloud (US-East)

    Canada(Central)

    China(Bejing)

    China(Ningxia)

    EU(Ireland)

    EU(Frankfurt)

    EU(Paris)

    EU(Stockholm)

    South America(Sao Paulo)

    Middle East(Bahrain)

    US East(N. Virginia)

    US East(Ohio)

    US West(N. California)

    US West(Oregon)

Default is
None
.
Customer Master Key ID
The customer master key ID generated by AWS Key Management Service (AWS KMS) or the ARN of your custom key for cross-account access.
You must generate the customer master key ID for the same region where your Amazon S3 staging bucket resides. You can either enter the customer-generated customer master key ID or the default customer master key ID.

Read properties

The following table describes the advanced source properties that you can configure in the Python code to read from Amazon Redshift:
Property
Description
Read Mode
Specifies the read mode to read data from the Amazon Redshift source.
You can select one the following read modes:
  • Direct. Reads data directly from the Amazon Redshift source without staging the data in Amazon S3.
  • Staging. Reads data from the Amazon Redshift source by staging the data in the S3 bucket.
Default is Staging.
Fetch Size
Determines the number of rows to read in one resultant set from Amazon Redshift. Applies only when you select the
Direct
read mode.
Default is 10000.
If you specify fetch size 0 or if you don't specify a fetch size, the entire data set is read directly at the same time than in batches.
S3 Bucket Name
Amazon S3 bucket name for staging the data.
You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the
REGION
attribute in the Unload command options.
Enable Compression
Compresses the staging files into the Amazon S3 staging directory.
The task performance improves when the Secure Agent compresses the staging files. Default is selected.
Staging Directory Location
Location of the local staging directory.
When you run a task in Secure Agent runtime environment, specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment.
Specify the directory path in the following manner:
<staging directory>
For example,
C:\Temp
. Ensure that you have the write permissions on the directory.
Unload Options
Unload command options.
Add options to the Unload command to extract data from Amazon Redshift and create staging files on Amazon S3. Provide an Amazon Redshift Role Amazon Resource Name (ARN).
You can add the following options:
  • DELIMITER
  • ESCAPE
  • PARALLEL
  • NULL
  • AWS_IAM_ROLE
  • REGION
  • ADDQUOTES
For example: DELIMITER = \036;ESCAPE = OFF;NULL=text;PARALLEL = ON;AWS_IAM_ROLE=arn;aws;iam;;<account ID>;role/<role-name>;REGION = ap-south-1
Specify a directory on the machine that hosts the Secure Agent.
If you do not add the options to the Unload command manually, the Secure Agent uses the default values.
Treat NULL Value as NULL
Retains the null values when you read data from Amazon Redshift.
Encryption Type
Encrypts the data in the Amazon S3 staging directory.
You can select the following encryption types:
  • None
  • SSE-S3
  • SSE-KMS
  • CSE-SMK
Default is None.
Download S3 Files in Multiple Parts
Downloads large Amazon S3 objects in multiple parts.
When the file size of an Amazon S3 object is greater than 8 MB, you can choose to download the object in multiple parts in parallel.
Default is 5 MB.
Multipart Download Threshold Size
The maximum threshold size to download an Amazon S3 object in multiple parts.
Default is 5 MB.
Schema Name
Overrides the default schema name.
You cannot configure a custom query when you use the schema name.
Source Table Name
Overrides the default source table name.
When you select the source type as
Multiple Objects
or
Query
, you cannot use the
Source Table Name
option.
Pre-SQL
The pre-SQL commands to run a query before you read data from Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Post-SQL
The post-SQL commands to run a query after you write data to Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Select Distinct
Selects unique values.
The Secure Agent includes a
SELECT DISTINCT
statement if you choose this option. Amazon Redshift ignores trailing spaces. Therefore, the Secure Agent might extract fewer rows than expected.
If you select the source type as query or use the
SQL Query
property and select the
Select Distinct
option, the Secure Agent ignores the
Select Distinct
option.
SQL Query
Overrides the default SQL query.
Enclose column names in double quotes. The SQL query is case sensitive. Specify an SQL statement supported by the Amazon Redshift database.
When you specify the columns in the SQL query, ensure that the column name in the query matches the source column name in the mapping.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration up to a maximum of 12 hours in the AWS console and then enter the same time duration in this property.
Tracing Level
Use the verbose tracing level to get the amount of detail that appears in the log for the Source transformation.

Write properties

The following table describes the advanced target properties that you can configure in the Python code to write to Amazon Redshift:
Property
Description
S3 Bucket Name
Amazon S3 bucket name for writing the files to Amazon Redshift target.
You can also specify the bucket name with the folder path. If you provide an Amazon S3 bucket name that is in a different region than the Amazon Redshift cluster, you must configure the
REGION
attribute in the Copy command options.
Enable Compression
Compresses the staging files before writing the files to Amazon Redshift.
The task performance improves when the Secure Agent compresses the staged files. Default is selected.
Staging Directory Location
Location of the local staging directory.
When you run a task in Secure Agent runtime environment, specify a directory path that is available on the corresponding Secure Agent machine in the runtime environment.
Specify the directory path in the following manner:
<staging directory>
For example,
C:\Temp
. Ensure that you have the write permissions on the directory.
Batch Size
Minimum number of rows in a batch.
Enter a number greater than 0. Default is 2000000.
Max Errors per Upload Batch for INSERT
Number of error rows that causes an upload insert batch to fail.
Enter a positive integer. Default is 1.
If the number of errors is equal to or greater than the property value, the Secure Agent writes the entire batch to the error file.
Truncate Target Table Before Data Load
Deletes all the existing data in the Amazon Redshift target table before loading new data.
Require Null Value For Char and Varchar
Replaces the string value with NULL when you write data to Amazon Redshift columns of Char and Varchar data types.
Default is an empty string.
When you run a mapping to write null values to a table that contains a single column of the Int, Bigint, numeric, real, or double data type, the mapping fails. You must provide a value other than the default value in the
Require Null Value For Char And Varchar
property.
WaitTime In Seconds For S3 File Consistency
Number of seconds to wait for the Secure Agent to make the staged files consistent with the list of files available on Amazon S3.
Default is 0.
Copy Options
Copy command options.
Add options to the Copy command to write data from Amazon S3 to the Amazon Redshift target when the default delimiter comma (,) or double-quote (") is used in the data. Provide the Amazon Redshift Role Amazon Resource Name (ARN).
You can add the following options:
  • DELIMITER
  • ACCEPTINVCHARS
  • QUOTE
  • COMPUPDATE
  • AWS_IAM_ROLE
  • REGION
For example:
DELIMITER = \036;ACCEPTINVCHARS = #;QUOTE = \037 COMPUPDATE = ON;AWS_IAM_ROLE=arn:aws:iam::<account ID>:role/<role-name>;REGION = ap-south-1
Specify a directory on the machine that hosts the Secure Agent.
If you do not add the options to the Copy command manually, the Secure Agent uses the default values.
S3 Server Side Encryption
Indicates that Amazon S3 encrypts data during upload.
Provide a customer master key ID in the connection property to enable this property. Default is not selected.
S3 Client Side Encryption
Indicates that the Secure Agent encrypts data using a private key.
Provide a master symmetric key ID in the connection property to enable this property. If you enable both server-side and client-side encryptions, the Secure Agent ignores the server-side encryption.
Analyze Target Table
Runs an ANALYZE command on the target table.
The query planner on Amazon Redshift updates the statistical metadata to build and choose optimal plans to improve the efficiency of queries.
Vacuum Target Table
Recovers disk space and sorts the row in a specified table or all tables in the database.
You can select the following recovery options:
  • None
  • Full
  • Sort Only
  • Delete Only
  • Reindex
Default is None.
Prefix to retain staging files on S3
Retains staging files on Amazon S3.
Provide both a directory prefix and a file prefix separated by a slash (/) or only a file prefix to retain staging files on Amazon S3. For example,
backup_dir/backup_file
or
backup_file
.
Success File Directory
Directory for the Amazon Redshift success file.
Specify a directory on the machine that hosts the Secure Agent.
Error File Directory
Directory for the Amazon Redshift error file.
Specify a directory on the machine that hosts the Secure Agent.
Treat Source Rows As
Overrides the default target operation.
Default is
INSERT
.
Select one of the following override options:
NONE
By default, none is enabled. The Secure Agent considers the task operation that you select in the
Operation
target property.
INSERT
Performs insert operation. If enabled, the Secure Agent inserts all rows flagged for insert. If disabled, the Secure Agent rejects the rows flagged for insert.
DELETE
Performs delete operation. If enabled, the Secure Agent deletes all rows flagged for delete. If disabled, the Secure Agent rejects all rows flagged for delete.
UPDATE and UPSERT
Performs update and upsert operations. To perform an update operation, you must map the primary key column and at least one column other than primary key column. You can select the following data object operation attributes:
  • Update as Update: The Secure Agent updates all rows as updates.
  • Update else Insert: The Secure Agent updates existing rows and inserts other rows as if marked for insert.
Amazon Redshift V2 Connector does not support the Upsert operation in the Upgrade Strategy transformation. To use an Update Strategy transformation to write data to an Amazon Redshift target, you must select
Treat Source Rows As
as
None
.
By default, the Secure Agent performs the task operation based on the value that you specify in the
Operation
target property. However, if you specify an option in the
Treat Source Rows As
property, the Secure Agent ignores the value of that you specify in the
Operation
target property or in the Update Strategy transformation.
Override Target Query
Overrides the default update query that the Secure Agent generates for the update operation with the update query that you specify.
TransferManager Thread Pool Size
Number of threads to write data in parallel.
Default is 10.
Pre-SQL
The pre-SQL commands to run a query before you read data from Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Post-SQL
The post-SQL commands to run a query after you write data to Amazon Redshift. You can also use the UNLOAD or COPY command. The command you specify here is processed as a plain text.
Preserve record order on write
Retains the order of the records when you read data from a CDC source and write data to an Amazon Redshift target.
Use this property when you create a mapping to capture the changed record from a CDC source. This property enables you to avoid inconsistencies between the CDC source and target.
Minimum Upload Part Size
Minimum size of the Amazon Redshift object to upload an object.
Default is 5 MB.
Number of files per batch
Calculates the number of the staging files per batch.
If you do not provide the number of files, Amazon Redshift V2 Connector calculates the number of the staging files.
Schema Name
Overrides the default schema name.
Target table name
Overwrites the default target table name.
Recovery Schema Name
Schema that contains recovery information stored in the
infa_recovery_table
table on the target system to resume the extraction of the changed data from the last checkpoint.
Temporary Credential Duration
The time duration during which an IAM user can use the dynamically generated temporarily credentials to access the AWS resource. Enter the time duration in seconds.
Default is 900 seconds.
If you require more than 900 seconds, you can set the time duration maximum up to 12 hours in the AWS console and then enter the same time duration in this property.
Forward Rejected Rows
This property is not applicable for Amazon Redshift V2 Connector.

0 COMMENTS

We’d like to hear from you!