Table of Contents

Search

  1. Preface
  2. Introduction to Data Validation Option
  3. Repositories
  4. XML Data Source
  5. Tests for XML Data Sources
  6. Connections
  7. Expressions
  8. Table Pairs
  9. Tests for Table Pairs
  10. Single-Table Constraints
  11. Tests for Single-Table Constraints
  12. Examples of Tests from Spreadsheets
  13. SQL Views
  14. Lookup Views
  15. Join Views
  16. Aggregate Views
  17. Business Intelligence and Reporting Tools Reports
  18. Dashboards
  19. DVOCmd Command Line Program
  20. Troubleshooting
  21. Datatype Reference
  22. Reporting Views
  23. Metadata Import Syntax
  24. Jasper Reports
  25. Glossary

Data Validation Option User Guide

Data Validation Option User Guide

Data Sampling for a Table Pair

Data Sampling for a Table Pair

You can use data sampling to run tests on a subset of a data set. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.
You can perform data sampling on one table in a table pair. When you run a table-pair test, Data Validation Option runs the test based on a percentage of the rows in the sampled table and all of the rows in the other table.
You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.
By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.
If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.
Because data sampling reduces the number of rows in one table of the table pair, some tests might have different results based on whether you enable data sampling. For example, a Count test might fail after you enable data sampling on one table in a table pair because Data Validation Option processes less rows in one of the tables in the table pair. An Outer Value test might fail after you enable data sampling because the test encounters orphan rows when it performs a full outer join between the tables in the table pair.