Table of Contents

Search

  1. Preface
  2. Introduction to Data Validation Option
  3. New Features and Behavior Changes
  4. Repositories
  5. XML Data Source
  6. Tests for XML Data Sources
  7. Connections
  8. Expressions
  9. Table Pairs
  10. Tests for Table Pairs
  11. Single-Table Constraints
  12. Tests for Single-Table Constraints
  13. Examples of Tests from Spreadsheets
  14. SQL Views
  15. Lookup Views
  16. Join Views
  17. Aggregate Views
  18. Business Intelligence and Reporting Tools Reports
  19. Dashboards
  20. DVOCmd Command Line Program
  21. Troubleshooting
  22. Datatype Reference
  23. Reporting Views
  24. Metadata Import Syntax
  25. Jasper Reports
  26. Glossary

Data Validation Option User Guide

Data Validation Option User Guide

Data Sampling for a Single-Table Constraint

Data Sampling for a Single-Table Constraint

You can use data sampling to run tests on a subset of a dataset. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.
When you run a test on a sample data set, Data Validation Option runs the test on a percentage of the data set. The sample percentage that you specify represents the chance that each data is included in the sample.
You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.
By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.
If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.
Because data sampling reduces the number of rows, some tests might have different results based on whether you enable data sampling. For example, a SUM test might fail after you enable data sampling because Data Validation Option processes less rows.