Table of Contents

Search

  1. Preface
  2. Part 1: Introduction to Google BigQuery connectors
  3. Part 2: Data Integration with Google BigQuery V2 Connector
  4. Part 3: Data Integration with Google BigQuery Connector

Google BigQuery Connectors

Google BigQuery Connectors

Rules and guidelines for mappings in advanced mode

Rules and guidelines for mappings in advanced mode

Consider the following guidelines when you create a mapping in advanced mode:
  • Use a Google BigQuery V2 connection in hybrid mode.
  • You cannot configure key range and pass-through partitioning in a mapping that reads from a Google BigQuery source.
  • When you perform an update, upsert, or delete operation on a Google BigQuery target table that resides in a region other than the US regions, you must specify the
    Region ID
    connection property in the Google BigQuery V2 connection.
  • When you configure a simple filter condition on a column of Date, Time, Timestamp, or DateTime data type in the source table, ensure that you specify the following DateTime format:
    YYYY-MM-DD-HH24:MM:SS:MS
  • When you read data from a column of String data type in a Google BigQuery source and write data to a column of Date, DateTime, Time, or Timestamp data type in a Google BigQuery target, ensure that the string data in the source must be of the following DateTime format:
    YYYY-MM-DD-HH24:MM:SS
  • When you read data from or write data to a partitioned table in Google BigQuery, ensure that you unselect the
    Use Legacy SQL for Custom Query
    option in the Google BigQuery V2 connection. Otherwise, data preview fails.
  • You cannot edit the metadata of the fields of hierarchical data type in a mapping in advanced mode.
  • When you import a Google BigQuery source or target table that contains a column of Byte data type, data preview displays blank value for the column of Byte data type.
  • When you perform an update operation on a Google BigQuery target, ensure that the update column does not contain NULL values. Otherwise, the Secure Agent fails to update the rows.
  • When you perform an upsert operation on a Google BigQuery target, ensure that the update column does not contain NULL values. Otherwise, the Secure Agent performs insert operation instead of update operation.
  • When you perform a delete operation on a Google BigQuery target, ensure that the update column does not contain NULL values. Otherwise, the Secure Agent fails to delete the rows.
  • When you parameterize the source object, the
    Input Parameters
    tab in the
    mapping
    task displays the source object name as Default.
  • When you write data to a Google BigQuery target and specify the staging file name and persist the staging file in Google Cloud Storage, you must delete the staging file after you run the
    mapping
    . Otherwise, when you re-run the
    mapping
    , the
    mapping
    fails with the following error:
    java.lang.RuntimeException: The object path already exists
  • To achieve maximum throughput when you read data from a large dataset, the value of the
    Number of Spark Partitions
    must be equal to the number of Spark executors defined for the Google Cloud Platform
    advanced cluster
    .
  • When you use a parameterized Google BigQuery V2 connection in the Target transformation, the Update Column field does not display in the target properties. In the Target transformation, select a valid connection and object that display in the list and then select the operation.
  • When the Google BigQuery target object is parameterized and the selected operation is data driven or upsert in the mapping, the Update Column field does not display in the dynamic mapping task target properties.
  • Do not configure an override to the update strategy from the task properties. The agent does not honor the order of precedence and considers the update strategy value specified in the mapping.
  • When you read data from and write data to a Google BigQuery column of the Record data type that contains data of the Numeric data type, the precision of the Numeric data must not exceed 15 digits.
  • If an existing mapping is set with a precision of 28 digits for the Numeric data type, you can refresh the mapping to set the default precision of 38 digits. However, for hierarchical data types, only up to a precision of 28 digits is applicable.
  • When you use Create New at Runtime to write data to Google BigQuery, the mapping task creates the physical target based on the fields from the upstream transformation in the initial run. But later if you delete the created target table and re-run the mapping task, the Secure Agent fails to create the target table.
  • You cannot read Integer data in a column of Array or Struct data type from a Google Cloud Storage Avro, JSON, or Parquet file and write the data to a Google BigQuery target.
  • You cannot read complex data types that contain multidimensional array from a Google Cloud Storage source and write the data to a Google BigQuery target.
  • When you read data from a column of the DateTime data type in a Google BigQuery source, ensure that the column does not contain DateTime values earlier than 1970-01-01. Otherwise, the mapping fails with the following error:
    java.lang.RuntimeException
  • When you run a mapping to read DateTime data from a Google BigQuery source column of the Record data type with the
    YYYY-MM-DD HH24:MI:SS.US
    format and write data to a Google BigQuery target column of the Record data type that contains data of the DateTime data type, the Secure Agent truncates the microsecond values and writes the DateTime values in the
    YYYY-MM-DD HH24:MI:SS.MS
    format.
  • When you read data from or write data to a Google BigQuery table, ensure that the Google BigQuery source or target does not contain more that 2000 columns. Otherwise, the mapping fails with the following error:
    HTTP POST request failed due to IO error.
  • In advanced mode, you cannot write data Null values from an Array data type to a Google BigQuery target. If you do, the mapping fails with a runtime exception.
  • The date/time data types in local time zone is converted into UTC when loading data to the Google BigQuery target, resulting in a mismatch in the data between the source and target. Data conversion from String data type to date/time datatypes results in data mismatch for the date/time data types between the source and target.
  • When you switch mapping to advanced mode and the mapping uses hierarchical data types, you must manually reimport the Google BigQuery object and create the mapping again.
  • When you read or write hierarchical data types in a mapping, the time zone defaults to the Secure Agent host machine time zone. You must change the time zone to UTC time zone to run the mapping successfully. To change to the UTC time zone, you can set the Spark session properties for a specific task from the task properties.
    To set the properties for a specific task, navigate to the Spark session properties in the task properties, and perform the following steps:
    • Select the session property name as
      spark.driver.extraJavaOptions
      and set the value to
      -Duser.timezone=UTC
      .
    • Select
      spark.executor.extraJavaOptions
      and set the value to
      -Duser.timezone=UTC
      .
  • When you configure a mapping in advanced mode that writes the data of more than 10 MB to a row, the mapping fails.
  • When you configure a mapping to write Date, Datetime, and Timestamp values,
    0001-01-01 00:00:00
    or
    0001-01-02
    into a Timestamp column in the target, the mapping fails. Also, data corruption might occur when you write Date, Datetime, and Timestamp values less than
    1910-01-01 00:00:00
    .

0 COMMENTS

We’d like to hear from you!