Table of Contents

Search

  1. Preface
  2. Part 1: Getting Started with Snowflake Data Cloud Connector
  3. Part 2: Data Integration with Snowflake Data Cloud Connector
  4. Part 3: SQL ELT with Snowflake Data Cloud Connector
  5. Appendix A: Data type reference
  6. Appendix B: Additional runtime configurations
  7. Appendix C: Upgrading to Snowflake Data Cloud Connector

Snowflake Data Cloud Connector

Snowflake Data Cloud Connector

Configuring the batch size and the number of local staging files

Configuring the batch size and the number of local staging files

To achieve optimal performance of a mapping that writes data to Snowflake, allow the mapping to consider the default values for both the batch size and number of local staging files based on the amount of data it processes. Unless your use case requires that you set the values, it is recommended that the mapping processes with the default values.
The priority that the mapping gives to these two configurations depends on which threshold is reached first. If the mapping satisfies the criteria for the number of local staging files before reaching the specified batch row size, the number of local staging files takes precedence since that threshold is reached first.
For example, see the following table which displays the number of insert commands issued for various configurations of the specified batch size and the number of local staging files in a write operation to Snowflake:
Number of Source Rows
Row Size (bytes)
Default File Size
Batch Row Size
Number of Local Staging Files
Number of Inserts
4
52019206 bytes
52428800 bytes
4
1
2
4
52019206 bytes
52428800 bytes
4
Not configured. Considers the default value of 64.
1
4
52019206 bytes
52428800 bytes
1
Not configured. Considers the default value of 64.
4
4
52019206 bytes
52428800 bytes
3
1
2
4
52019206 bytes
52428800 bytes
1
1
4
Scenario 1: The batch row size is 4 and the number of local staging files is 1.
With a batch row size of 4 and 1 local staging file, two insert statements are executed in Snowflake when processing data.
When the file size exceeds the default 50 MB because there are two rows, a new file is created for the next incoming row. Given that the number of local staging files is set to 1, a stage can hold a maximum of one file. Hence, two stages are created, each holding one file with two rows. Each of the file sizes is 104038412 bytes, which is 52019206 bytes per row. For each stage, an insert command is issued, leading to two insert commands.
Scenario 2: The batch row size is 4 and the number of local staging files value is not set.
Using the default value of 64 for the local staging files, the mapping considers the batch size when the local staging file threshold is not reached.
Scenario 3: The batch row size is 1 and the number of local staging files value is not set.
With a batch row size of 1 and no specific local staging files value set, the mapping issues four insert statements for four rows as the staging files threshold is not reached.
Scenario 4: The batch row size is 3 and the number of local staging files value is 1.
Using a batch row size of 3 and one local staging file, the mapping issues two insert commands after processing two rows in the first stage and two rows in the second stage.
Scenario 5: The batch row size is 1 and the number of local staging files value is 1.
With a batch row size of 1 and one local staging file, four insert commands are issued for each row individually as the staging files threshold is never met due to the specified smaller row size.

0 COMMENTS

We’d like to hear from you!