Frequently Asked Questions for Google BigQuery Connector

Frequently Asked Questions for Google BigQuery Connector

Performance Tuning Questions

Performance Tuning Questions

When should I use direct mode to read data from a Google BigQuery source?
Use direct mode when the volume of data that you want to read is small. In direct mode, Google BigQuery Connector directly reads data from a Google BigQuery source.
When should I use staging mode to read data from a Google BigQuery source?
Use staging mode when you want to read large volumes of data in a cost-efficient manner.
In staging mode, Google BigQuery Connector first exports the data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery Connector downloads the data from Google Cloud Storage into a local stage file. You can configure the local stage file directory in the advanced source properties. Google BigQuery Connector then reads the data from the local stage file.
When should I use bulk mode to write data to a BigQuery target?
Use bulk mode when you want to write large volumes of data and improve the performance. In bulk mode, Google BigQuery Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery Connector loads the data from the staging file to the BigQuery target.
When should I use streaming mode to write data to a BigQuery target?
Use streaming mode when you want the Google BigQuery target data to be immediately available for querying and real-time analysis. In streaming mode, Google BigQuery Connector directly writes data to the BigQuery target. Google BigQuery Connector appends the data into the BigQuery target.
Evaluate Google's streaming quota policies and billing policies before you use streaming mode.
Which data format should I use for the staging file to improve the performance?
Use the CSV data format to improve the performance.
Should I enable staging file compression when I read data from a Google BigQuery source?
You can enable staging file compression to improve the performance.
Should I enable staging file compression when I write data to a Google BigQuery target?
You can enable staging file compression when the network bandwidth is low. Enabling compression reduces the time that Google BigQuery Connector takes to write data to Google Cloud Storage. However, there will be a performance degradation when Google BigQuery Connector writes data from Google Cloud Storage to the Google BigQuery target.
Does the Number of Threads for Downloading Staging Files option impact the performance when I read data from a BigQuery source in staging mode?
No. The value of the
Number of Threads for Downloading Staging Files
option does not affect the performance when you read data from a Google BigQuery source in staging mode. However, if you specify a value greater than one, the reader will consume more resources.
How do I configure the Number of Threads for Uploading Staging Files option to improve the performance?
To improve the performance when you write data to a Google BigQuery target in bulk mode, you can set the
Number of Threads for Uploading Staging File
option to 10. This recommendation is based on observations in an internal Informatica environment using data from real-world scenarios. The performance might vary based on individual environments and other parameters. You might need to test different settings for optimal performance.
If you configure more than one thread, you must increase the Java heap size in the
JVMOption3
field for DTM under the
System Configuration Details
section of the Secure Agent. You must then restart the Secure Agent for the changes to take effect.
How do I configure the local stage file directory when I write to a Google BigQuery target?
Set the local stage file directory to a directory on your local machine to improve the performance. By default, the local stage file directory is set to
/Temp
where the Secure Agent is installed.
How can I run concurrent mapping tasks?
To run concurrent mapping tasks, you must perform the following tasks:
  1. Open
    Administrator
    and select
    Runtime Environments
    .
  2. Select the Secure Agent that is used as the runtime environment for a mapping task.
  3. Click
    Edit Secure Agent
    in
    Actions
    .
    The
    Edit Secure Agent
    page appears.
  4. Select the
    Service
    as
    Data Integration Server
    in the
    Custom Configuration Details
    section.
  5. Select the
    Type
    as
    Tomcat
    in the
    Custom Configuration Details
    section.
  6. Add
    maxDTMProcesses
    in the
    Name
    field.
  7. Specify a number in the
    Value
    field based on the number of concurrent mapping tasks that you want to run.
    For example, specify
    500
    in the
    Value
    field to run 500 concurrent mapping tasks.
    The following image shows the
    Custom Configuration Details
    section:
  8. Repeat steps 2 through 7 on every Secure Agent that is used as the runtime environment for a mapping task.
  9. Create a schedule on the
    Schedules
    page or in the mapping task wizard.
  10. Associate the schedule with all the mapping tasks that you want to run concurrently.
This recommendation is based on observations in an internal Informatica environment using data from real-world scenarios. The performance might vary based on individual environments and system configuration. You might need to test different settings for optimal performance.
Google BigQuery limits the number of API calls made per service account. To avoid run-time failures, ensure that you have Google BigQuery connections with different service account IDs.


Updated August 06, 2020