Running data validation for a database ingestion and replication jobs
Running data validation for a
database ingestion and replication
jobs
For initial load jobs that completed successfully, you can run data validation to compare the source and target data. Data validation is available only for initial load jobs that have an Oracle or a SQL Server source and a Snowflake target.
The availability of the data validation feature is controlled by an organization-level feature flag. If this functionality is not available for your organization but you want to use it, contact Informatica Global Customer Support.
When you run data validation for
Database Ingestion and Replication
, you will be charged per the CPU consumption on the
Data Validation
service side.
The source and target connections defined in the task for which you want to run the data validation must be on the same Secure Agent. You must enable the
Data Validation
service on the Secure Agent.
The source and target schemas specified in the task definition must be the same as the schemas used in the source and target connection properties.
In the Snowflake Data Cloud connection properties, enter the database and schema name in the
Additional JDBC URL Parameters
field in the following format:
db=
<database_name>
&schema=
<schema_name>
For data validation to run successfully, the source table and column names cannot contain any special characters. Otherwise, data validation fails.
To prevent false alarms that result from validating unsupported data types, you can exclude these data types by using the datavalidation.datatypes.skip custom property. On the
Schedule and Runtime Options
page of the task wizard, enter datavalidation.datatypes.skip as the property name and a comma-separated list of data types as the property value.
To display the job details, drill down on a job from the
My Jobs
page in the
Data Integration
service, the
All Jobs
page in the
Monitor
service, or from the
Data Ingestion and Replication
page in
Operational Insights
service.
On the
Object Detail
pane, navigate to the subtask row for which you want to run data validation. In the Actions menu for the row, select
Run Data Validation
.
For the
Run Data Validation
option to be available, the task must have the status of
Completed
.
Configure how the data should be validated:
Select the Flat file connection.
This connection will be used to store the data validation results.
The Flat file connection and the
database ingestion and replication
job must be on the same runtime environment.
In the
Sample
field, select the option for sampling the size of the data for comparison. The default value is
Last 1000 Rows
.
Click
Run
.
The data validation process starts. The
Data Validation
column in the
Object Detail
pane shows the data validation status for the selected task.
If data validation processing completes successfully, you can click the
Success
status to view the Data Validation Summary. The summary contains the results of the row count validation and the cell-to-cell comparison.
To download a detailed data validation report, click the Download icon. The report highlights any missing or modified rows and columns based on a comparison of the source and target tables.
If an error occurred during the data validation processing, click the Download icon next to the