Table of Contents

Search

  1. Preface
  2. Part 1: Introduction to Google BigQuery connectors
  3. Part 2: Data Integration with Google BigQuery V2 Connector
  4. Part 3: Data Integration with Google BigQuery Connector

Google BigQuery Connectors

Google BigQuery Connectors

Understanding an SCD type 2 merge mapping

Understanding an SCD type 2 merge mapping

The SCD Type 2 merge mapping uses a BigQuery source and two target transformations that write to the same Google BigQuery table. One target transformation updates the table while the other transformation inserts data to the Google BigQuery table.
The following image shows a mapping that writes slowly changing dimension data to a Google BigQuery target table:
The image shows a mapping that writes slowly changing dimension data to a Google BigQuery table.
Add expression and lookup transformations to compare source data against the existing target data. You enter the lookup conditions and source columns that you want the Data Integration to compare against the existing target.
For each source row without a matching row in the target, the Expression transformation marks the new row. For each source row with a matching row in the target, the Expression transformation compares existing source and target columns with the MD5() function. If those columns do not match, the Expression marks the existing target row as an inactive row and inserts a new target row as an active row. The mapping then splits into two data groups using the Router transformation.
You must generate an UUID value through the Expression transformation and add it as a unique ID column and also as the first column in the target. Additionally, you can add an active status flag, MD5() hash value, start timestamp, and end timestamp columns to write to the target through the Expression.
The first data flow from the Router transformation passes only new rows to the Expression transformation. The Expression transformation inserts new rows to the target. The Expression transformation also assigns an UUID value and updates the start timestamp, MD5() function hash value, and the active status as 1 for each new row.
In the second data flow, the Router transformation passes only changed rows to pass to the Expression transformation. The Expression transformation inserts changed rows to the target. The Expression transformation updates the active status as 0 and adds the end timestamp for the existing row in the target.

0 COMMENTS

We’d like to hear from you!