Table of Contents

Search

  1. Preface
  2. Introduction to Data Validation Option
  3. Repositories
  4. XML Data Source
  5. Tests for XML Data Sources
  6. Connections
  7. Expressions
  8. Table Pairs
  9. Tests for Table Pairs
  10. Single-Table Constraints
  11. Tests for Single-Table Constraints
  12. Examples of Tests from Spreadsheets
  13. SQL Views
  14. Lookup Views
  15. Join Views
  16. Aggregate Views
  17. Business Intelligence and Reporting Tools Reports
  18. Dashboards
  19. DVOCmd Command Line Program
  20. Troubleshooting
  21. Datatype Reference
  22. Reporting Views
  23. Metadata Import Syntax
  24. Jasper Reports
  25. Glossary

Data Validation Option User Guide

Data Validation Option User Guide

Data Sampling for a Single-Table Constraint

Data Sampling for a Single-Table Constraint

You can use data sampling to run tests on a subset of a dataset. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.
When you run a test on a sample data set, Data Validation Option runs the test on a percentage of the data set. The sample percentage that you specify represents the chance that each data is included in the sample.
You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.
By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.
If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.
Because data sampling reduces the number of rows, some tests might have different results based on whether you enable data sampling. For example, a SUM test might fail after you enable data sampling because Data Validation Option processes less rows.

0 COMMENTS

We’d like to hear from you!