You can use data sampling to run tests on a subset of a data set. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.
You can perform data sampling on one table in a table pair. When you run a table-pair test, Data Validation Option runs the test based on a percentage of the rows in the sampled table and all of the rows in the other table.
You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.
By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.
If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.
Because data sampling reduces the number of rows in one table of the table pair, some tests might have different results based on whether you enable data sampling. For example, a Count test might fail after you enable data sampling on one table in a table pair because Data Validation Option processes less rows in one of the tables in the table pair. An Outer Value test might fail after you enable data sampling because the test encounters orphan rows when it performs a full outer join between the tables in the table pair.