Preface
Introduction to Data Validation Option
- Data Validation Option Overview
  - Data Validation Guidelines
- Data Validation Option Architecture
- Data Validation Test Components
- Data Validation Process
- Data Validation Views
- Data Validation Table Objects
- Data Validation Tests
- Data Validation Reports
- Data Validation Example
- Data Validation Client
Repositories
- Repositories Overview
- Adding a Repository
- Editing Repositories
- Deleting Repositories
- Refreshing Repositories
  - Troubleshooting a Repository Refresh
- Folders
- Exporting Repository Metadata
- Metadata Export and Import
  - Exporting Metadata
  - Importing Metadata
- Metadata Manager Integration
XML Data Source
- XML Data Source Overview
- XML Groups
- XML Data Conversion to a Relational Structure
  - Join View Properties for an XML File
  - Automatically Generating a Join View for an XML File
- Rules and Guidelines for XML Files
Tests for XML Data Sources
- Test for XML Data Sources Overview
- Importing XML Definitions and Viewing them in Data Validation Option
- Using XML Sources Directly in Table Pair or Single Table Tests
- Flattening XML using a Join View
- Editing Table and Column Names in the Join View
- Overriding XML Source File Information
- Troubleshooting
  - Maximum Row Size Limit
Connections
- Connections Overview
- File Connections
- Relational Connection Properties
  - Owner Name and Table Name Override
- SAP Connection Properties
Expressions
- Expressions Overview
- Expression Types
- Expression Editor
  - Data Type, Precision, and Scale of Field Expressions
  - Creating an Expression
- Expression Syntax
- Functions
- Operators
- Parameters
- Test Logic Processing in the Data Source
- Expression Examples
- Reusable Expressions
  - Rules and Guidelines for User-Defined Functions
  - Using PowerCenter User-Defined Functions in Data Validation Option
Table Pairs
- Table Pairs Overview
- Basic Properties for a Table Pair
- Advanced Properties for a Table Pair
- Rules and Guidelines for WHERE Clauses in a Table Pair
- Table Joins
  - Rules and Guidelines for Table Joins
  - Table Join Optimization
- Bad Records Configuration for a Table Pair
  - Bad Records in Flat File
  - Bad Records in Database Schema Mode
- Data Sampling for a Table Pair
  - Rules and Guidelines for Data Sampling
- PowerCenter Cache for a Table Pair
- Adding a Table Pair
- Table Pair Generation from Table Comparisons
- Editing Table Pairs
- Deleting Table Pairs
- Viewing Overall Test Results for a Table Pair
Tests for Table Pairs
- Tests for Table Pairs Overview
- Table-Pair Tests
- Test Properties for a Table Pair
- Adding a Table-Pair Test
- Automatic Test Generation for Table Pairs
  - Column Comparison by Position
- Generating Tests for a Table Pair
- Test Generation for a Table Pair from a Spreadsheet
  - Spreadsheet Requirements for Table-Pair Tests
  - Generating Tests for a Table Pair from a Spreadsheet
- Editing a Table-Pair Test
- Deleting a Table-Pair Test
- Running Table-Pair Tests
- Bad Records for a Table-Pair Test
- Viewing the Details of a Bad Record
- Table Pair Test Example
- Troubleshooting Table-Pair Tests
Single-Table Constraints
- Single-Table Constraints Overview
- Basic Properties for a Single-Table Constraint
- Advanced Properties for a Single-Table Constraint
- Bad Records Configuration for a Single-Table Constraint
  - Bad Records in Database Schema Mode
  - Bad Records in Flat File
- Data Sampling for a Single-Table Constraint
  - Rules and Guidelines for Data Sampling
- PowerCenter Cache for a Single-Table Constraint
- Adding a Single Table
- Editing Single Tables
- Deleting Single Tables
- Viewing Overall Test Results for a Single-Table Constraint
Tests for Single-Table Constraints
- Tests for Single-Table Constraints Overview
- Single-Table Constraint Tests
- Test Properties for a Single-Table Constraint
- Adding a Single-Table Constraint Test
- Test Generation for a Single-Table Constraint from a Spreadsheet
  - Spreadsheet Requirements for Single-Table Tests
  - Generating Tests for a Single-Table Constraint from a Spreadsheet
- Editing a Single-Table Constraint Test
- Deleting a Single-Table Constraint Test
- Running Single-Table Constraint Tests
- Bad Records for Single-Table Constraint Tests
- Viewing the Details of a Bad Record
Examples of Tests from Spreadsheets
- Examples of Tests from Spreadsheets Overview
- Spreadsheet Import and Export of Table Pair Tests
- Spreadsheet Import for Compare Tables
  - Scenario 1 - Add tables to the Compare Tables Import Spreadsheet
    - Adding Tables to the Compare Tables Import Spreadsheet
  - Scenario 2 - Importing the Compare Tables Spreadsheet into DVO
SQL Views
- SQL Views Overview
- SQL View Properties
- Rules and Guidelines for SQL Views
- Adding an SQL View
- Editing SQL Views
- Deleting SQL Views
- SQL View Example
Lookup Views
- Lookup Views Overview
- Lookup View Properties
- Adding Lookup Views
- Editing Lookup Views
- Deleting Lookup Views
- Lookup Views Example
Join Views
- Join Views Overview
- Join View Data Sources
- Join View Properties
- Output Field Properties
  - Column Aliases
- Adding a Join View
- Join View Example
Aggregate Views
- Aggregate Views Overview
- Aggregate View Editor
- Aggregate View Properties
  - Group By Columns
  - Using Sorted Input
- Rules and Guidelines for Aggregate Views
- Adding an Aggregate View
- Editing Aggregate Views
- Deleting Aggregate Views
- Aggregate View Example
Business Intelligence and Reporting Tools Reports
- Business Intelligence and Reporting Tools Reports Overview
- BIRT Report Examples
Dashboards
- Dashboards Overview
- Dashboard Types
DVOCmd Command Line Program
- DVOCmd Command Line Program Overview
- Rules and Guidelines for Running DVOCmd Commands
- CopyFolder
- CreateUserConfig
- DisableInformaticaAuthentication
- EncryptPassword
- ExportMetadata
- ImportMetadata
- InstallTests
  - Cache Settings
- LinkDVOUsersToInformatica
- PurgeRuns
- RefreshRepository
- RunTests
  - Running Multiple Table Pair Objects in Parallel
  - Cache Settings
- UpdateInformaticaAuthenticationConfiguration
- UpgradeRepository
- Rules and Guidelines for Special Characters
Troubleshooting
- Troubleshooting Overview
- Troubleshooting Initial Errors
- Troubleshooting Ongoing Errors
- Troubleshooting Command Line Errors
Appendix A: Datatype Reference
- Test, Operator, and Datatypes Matrix for Table Pair Tests
- Test, Operator, and Datatypes Matrix for Single-Table Constraints
- Selecting a Valid PowerCenter Transformation Data Type for an Expression
- Properties of PowerCenter Transformation Data Types
Appendix B: Reporting Views
- Reporting Views Overview
- Using the Reporting Views
  - Sample Queries for Custom Reports
- results_summary_view
- rs_bad_records_view
- results_id_view
- meta_sv_view
- meta_lv_view
- meta_jv_view
- meta_ds_view
- meta_tp_view
- meta_av_view
- rs_jv_id_view
- rs_lv_id_view
- rs_sv_id_view
- rs_av_id_view
Appendix C: Metadata Import Syntax
- Metadata Import Syntax Overview
- Table Pair with One Test
- Table Pair with an SQL View as a Source
- Table Pair with Two Flat Files
- Table Pair with XML File and Relational Table
- Single-Table Constraint
- SQL View
- Lookup View
Appendix D: Jasper Reports
- Jasper Reports Overview
  - Status in Jasper Reports
  - Generating a Report
- Jasper Report Types
- Jasper Report Examples
Appendix E: Glossary
- aggregate tests
- bad records
- constraint value
- count test
- data validation
- Data Validation Option repository
- Data Validation Option user
- Data Validation Option folder
- DVOCmd
- format test
- inner join
- join view
- lookup view
- outer join
- single table
- SQL view
- table comparison
- table pair
- threshold
- unique test
- value test

Data Validation Option User Guide

10.5
- 10.5.2
- 10.5.1
- 10.4.0

Back Next

Data Sampling for a Table Pair

You can use data sampling to run tests on a subset of a data set. You might use data sampling when the data set is large. You can perform data sampling on table pairs and single tables.

You can perform data sampling on one table in a table pair. When you run a table-pair test, Data Validation Option runs the test based on a percentage of the rows in the sampled table and all of the rows in the other table.

You can use a seed value to repeat the same sample data set in multiple runs of a test. Data Validation uses the seed value as the starting value to generate a random number. If you do not enter a seed value, Data Validation Option generates a random seed value for each test run. You might use a seed value to replicate a Data Validation Option test.

By default, PowerCenter performs the sampling. If you sample data from IBM DB2, Microsoft SQL Server, Oracle, or Teradata, you can perform native sampling in the database. Push sampling to the database to increase performance.

If you add the WHERE clause and enable sampling, the order of operations depend on where you perform sampling and execute the WHERE clause. Generally, PowerCenter and the database performs sampling before executing the WHERE clause. However, when you configure the database to execute the WHERE clause and PowerCenter to perform sampling, the database executes the WHERE clause before PowerCenter performs the sampling.

Because data sampling reduces the number of rows in one table of the table pair, some tests might have different results based on whether you enable data sampling. For example, a Count test might fail after you enable data sampling on one table in a table pair because Data Validation Option processes less rows in one of the tables in the table pair. An Outer Value test might fail after you enable data sampling because the test encounters orphan rows when it performs a full outer join between the tables in the table pair.

Rename Saved Search

Table of Contents

Data Validation Option User Guide

Data Validation Option User Guide

Data Sampling for a Table Pair

Data Sampling for a Table Pair