The Data Vault calculates the total number of repartitioned data files for a table based on the row count. Then, the Data Vault determines the minimum and maximum values for each repartitioned data file. You specify the row count when you repartition data.
You can configure the following row count options when you repartition data files:
Keep the same row count
Configure the same row count as the original data files to create the same number of repartitioned data files. You can view the row count for each data file in the file size report.
Decrease the row count
Configure a smaller row count to increase the number of repartitioned data files. You might want to increase the number of repartitioned data files to help improve the performance of complex queries.
For example, the system has memory issues when you run a query on data files that include a large number of rows. The query includes a complex join statement on a table that has 10 million rows in each data file. The query is slow because there are too many rows to join for each data file. You repartition the table to include 5 million rows in each repartitioned data file.
Increase the row count
Configure a higher row count to decrease the number of data files. You might want to decrease the number of data files to have fewer files on disk. For example, you want fewer files on disk to simplify your backup strategy. You have a table that users do not run queries on. The table includes a high number of data files. You repartition the table to include 75% less data files.
After the Data Vault determines the total number of repartitioned data files, the Data Vault calculates the minimum and maximum value range for each repartitioned data file. The Data Vault uses the minimum and maximum value ranges to copy data from the original data files to the repartitioned data files.