You can use data sampling to run tests on a subset of a dataset. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.
When you run a test on a sample data set, Data Validation Option runs the test on a percentage of the data set. The sample percentage that you specify represents the chance that each data is included in the sample.
You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.
By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.
If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.
Because data sampling reduces the number of rows, some tests might have different results based on whether you enable data sampling. For example, a SUM test might fail after you enable data sampling because Data Validation Option processes less rows.