Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Preface
  2. Getting started with asset modernization
  3. Repository Assessment
  4. Asset Conversion
  5. Bulk Metadata Update
  6. Jobs
  7. Configurations

PowerCenter Modernization

PowerCenter Modernization

Google BigQuery advanced properties

Google BigQuery advanced properties

The following tables lists the Google BigQuery properties that you can configure for a
Connection Map configuration
:

Object properties

The following table describes the Google BigQuery connection properties:
Connection property
Description
Source Dataset ID
Name of the dataset that contains the source table and target table that you want to connect to.
Default is
datasetID_changeit
.
Google BigQuery supports the datasets that reside only in the US region.

Source advanced properties

The following table describes the Google BigQuery source advanced properties:
Property name
Description
Source Dataset ID
Overrides the Google BigQuery dataset name that you specified in the connection.
Default is
$$sf_database
.
Source Staging Dataset
Overrides the Google BigQuery staging dataset name that you specified in the connection and the Source Dataset ID source advanced property.
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Job Poll Interval In Seconds
The number of seconds after which Google BigQuery Connector polls the status of the read job operation.
Default is 10.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • Direct. In direct mode, Google BigQuery Connector reads data directly from the Google BigQuery source table.
    When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • Staging. In staging mode, Google BigQuery Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is Staging mode.
Use EXPORT DATA Statement to stage
Uses the EXPORT DATA statement to export data from Google BigQuery to Google Cloud Storage.
If the query contains an ORDER BY clause, the specified order is maintained when you export the data.
This property applies to staging mode.
Default is false.
Number of Threads for Downloading Staging Files
Specifies the number of files that Google BigQuery Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Default is 1.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Default is
$PMTempDir
.
Staging File Name
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Persist Extract Staging File After Download
Indicates whether Google BigQuery V2 Connector must persist the staging file after it reads data from the staging file.
By default, Google BigQuery V2 Connector deletes the staging file.
Persist Destination Table
Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table.
By default, Google BigQuery V2 Connector deletes the query results table.
Data format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • CSV. Supports flat data.
    In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
Default is
JSON
.
Enable Staging File Compression
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Default is false.
Retry Options
Comma-separated list to specify the following retry options:
  • Retry Count. The number of retry attempts to read data from Google BigQuery.
  • Retry Interval. The time in seconds to wait between each retry attempt.
  • Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Default is
$$gbq_retry_options
.
Use Legacy SQL for SQL Override
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the
SQL Override Query
property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the
SQL Override Query
property:
SELECT * FROM `projectID.datasetID.tableName
`
Default is false.
pre SQL Configuration
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL Configuration
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
Label
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs.
If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the
Project ID
value specified in the Google BigQuery V2 connection.
Number of Spark Partitions
Specifies the maximum number of partitions that the Spark engine splits the data into.
Default is 1.

Target advanced properties

The following table describes the Google BigQuery target advanced properties:
Property name
Description
Target Dataset ID
Overrides the Google BigQuery dataset name that you specified in the connection.
Default is
$$gbq_datasetID
.
Write Mode
Specifies the mode to write data to the Google BigQuery target.
You can select one of the following modes:
  • Bulk. Google BigQuery V2 Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery V2 Connector loads the data from the staging file to the BigQuery target. Google BigQuery V2 Connector then deletes the staging file unless you configure the task to persist the staging file.
  • Streaming. Google BigQuery V2 Connector directly writes data to the BigQuery target. Google BigQuery V2 Connector writes the data into the target row by row.
  • CDC. Applies only when you capture changed data from a CDC source. In CDC mode, Google BigQuery V2 Connector captures changed data from any CDC source and writes the changed data to a Google BigQuery target table.
Default is Bulk mode.
Data format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • Avro
  • JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • Parquet
  • CSV. Supports flat data.
    In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
Only JSON format is applicable for mappings in advanced mode.
This property applies to bulk and CDC mode.
Avro and parquet format is not applicable when you perform a data driven operation.
Default is
JSON
.
Enable Staging File Compression
Select this option to compress the size of the staging file before Google BigQuery writes the data to the Google Cloud Storage and decompress the staging file before it loads the data to the Google BigQuery target.
You can enable staging file compression to reduce cost and transfer time.
Default is
false
.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the files temporarily before writing the data to the staging file in Google Cloud Storage.
This property applies to bulk mode.
Default is
$PMTempDir
.
Use Default Column Values
Applicable when the selected data format for the staging file is CSV when the mapping contains unconnected ports. Includes the default column values for the unconnected port from the staging file to create the target. This is applicable when you have defined the default constraint value in the Google BigQuery source column. When you do not enable this option, the agent creates a target only with the connected ports. The agent populates null or empty strings for unconnected ports.
Default is
true
.
Enable Merge
Implements the Merge query to perform an update, upsert, delete or data driven operation on a Google BigQuery target table.
If you select the
Enable Data Driven
property, you must select this option.
Default is disabled.
Target Staging Dataset
Optional. Overrides the Google BigQuery staging dataset name that you specified in the connection and the Target Dataset ID target advanced property.
Create Disposition
Specifies whether Google BigQuery V2 Connector must create the target table if it does not exist.
You can select one of the following values:
  • Create if needed. If the table does not exist, Google BigQuery V2 Connector creates the table.
  • Create never. If the table does not exist, Google BigQuery V2 Connector does not create the table and displays an error message.
Create disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Disposition
Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists.
You can select one of the following values:
  • Write append. If the target table exists, Google BigQuery V2 Connector appends the data to the existing data in the table.
  • Write truncate. If the target table exists, Google BigQuery V2 Connector overwrites the existing data in the table.
  • Write empty. If the target table exists and contains data, Google BigQuery V2 Connector displays an error and does not write the data to the target. Google BigQuery V2 Connector writes the data to the target only if the target table does not contain any data.
Write disposition is applicable for bulk mode.
Write disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Streaming Template Table Suffix
Specify the suffix to add to the individual target tables that Google BigQuery V2 Connector creates based on the template target table.
This property applies to streaming mode.
If you select the Enable Merge option, Google BigQuery V2 Connector ignores this property.
Streaming mode is not applicable when you perform a data driven operation.
Rows per Streaming Request
Specifies the number of rows that Google BigQuery V2 Connector streams to the BigQuery target for each request.
Default is 500 rows.
The maximum row size that Google BigQuery V2 Connector can stream to the Google BigQuery target for each request is 10 MB.
This property applies to streaming mode.
Streaming mode is not applicable when you perform a data driven operation.
Staging File Name
Name of the staging file that Google BigQuery V2 Connector creates in the Google Cloud Storage before it loads the data to the Google BigQuery target.
This property applies to bulk mode.
Persist Staging File After Loading
Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference.
By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage.
This property applies to bulk mode.
Job Poll Interval In Seconds
The number of seconds after which Google BigQuery V2 Connector polls the status of the write job operation.
Default is 10.
Number of Threads for Uploading Staging File
The number of files that Google BigQuery V2 Connector must create to upload the staging file in bulk mode.
Allow Quoted Newlines
Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file.
Field Delimiter
Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file.
Allow Jagged Rows
Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file.
Suppress post SQL on Error
Indicates whether the Secure Agent must abort the post-SQL query execution in case the task fails to write data to the Google BigQuery target table due to errors.
Default is disabled.
pre SQL Configuration
Specify a pre-SQL configuration.
For example,
DestinationTable:PRESQL_TGT2,
DestinationDataset:EMPLOYEE,
FlattenResults:False,
WriteDisposition:WRITE_TRUNCATE,
UseLegacySql:False
post SQL Configuration
Specify a post-SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,
DestinationDataset:EMPLOYEE,
FlattenResults:True,
UseLegacySQL:False
Quote Char
Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
Default is double quotes.
Allow Duplicate Inserts
Indicates that the Secure Agent can insert duplicate rows into the
Google BigQuery
target.
Applicable only when you perform a data driven operation and DD_INSERT is specified in the data driven condition.
Default is not selected.
Disable Duplicate Update Rows
Determines if multiple incoming rows attempt to update the same target row, the Secure Agent must process only one of the incoming rows and ignore the rest of the incoming rows.
Select this option to configure the mapping to process only one of the incoming rows and ignore the rest of the incoming rows.
Default is disabled.
Label
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and load jobs.
If you omit the project ID here, the Secure Agent runs query and load jobs in the Google Cloud project corresponding to the
Project ID
value specified in the Google BigQuery V2 connection.
Forward Rejected Rows
Applicable only when you configure DD_REJECT constant in the data driven condition to reject all the rows.
Otherwise, this property is not applicable for Google BigQuery V2 Connector.

Lookup advanced properties

The following table describes the Google BigQuery lookup transformation advanced properties:
Property name
Description
Source Dataset ID
Overrides the Google BigQuery dataset name that you specified in the connection.
Default is
$$gbq_datasetID
.
Source Staging Dataset
Overrides the Google BigQuery staging dataset name that you specified in the Lookup transformation.
Job Poll Interval In Seconds
The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation.
Default is
10
.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • Direct. In direct mode, Google BigQuery V2 Connector reads data directly from the Google BigQuery source table.
    When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • Staging. In staging mode, Google BigQuery V2 Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is
Staging
mode.
Use EXPORT DATA Statement to stage
Uses the EXPORT DATA statement to export data from Google BigQuery to Google Cloud Storage.
If the query contains an ORDER BY clause, the specified order is maintained when you export the data.
This property applies to staging mode.
Default is
true
.
Number of Threads for Downloading Staging Files
Specifies the number of files that Google BigQuery Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Default is
1
.
Local Stage File Directory
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Default is
$PMTempDir
.
Data format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • Avro
  • JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • Parquet
  • CSV. Supports flat data.
    In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
Only JSON format is applicable for mappings in advanced mode.
This property applies to bulk and CDC mode.
Avro and parquet format is not applicable when you perform a data driven operation.
Default is
JSON
.
Enable Staging File Compression
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Default is
false
.
Retry Options
Comma-separated list to specify the following retry options:
  • Retry Count. The number of retry attempts to read data from Google BigQuery.
  • Retry Interval. The time in seconds to wait between each retry attempt.
  • Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Default is
$$gbq_retry_options
.
Use Legacy SQL for SQL Override
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the SQL Override Query property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the SQL Override Query property:
SELECT * FROM 'projectID.datasetID.tableName'
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Staging File Name
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Persist Extract Staging File After Download
Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference.
By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage.
This property applies to bulk mode.
Persist Destination Table
Indicates whether
Google BigQuery V2
Connector must persist the query results table after it reads data from the query results table.
By default,
Google BigQuery V2
Connector deletes the query results table.
pre SQL Configuration
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL Configuration
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
Label
You can assign a label for the transformation to organize and filter the associated jobs in the Google Cloud Platform Log Explorer.
Billing Project ID
The project ID for the Google Cloud project that is linked to an active Google Cloud Billing account where the Secure Agent runs query and extract jobs.
If you omit the project ID here, the Secure Agent runs query and extract jobs in the Google Cloud project corresponding to the
Project ID
value specified in the Google BigQuery V2 connection.
Number of Spark Partitions
Specifies the maximum number of partitions that the Spark engine splits the data into.
Default is 1.

0 COMMENTS

We’d like to hear from you!