You can use Data Validation Option to verify the accuracy of the data after data migration, replication, integration or other similar data movement or transformation exercise. You can use Cloudera Impala to enable Business Intelligence (BI), analytics and reporting on Hadoop or Impala-based data.
Data Validation Option automates data validation and makes tests repeatable. You create tests once and run the tests each time PowerCenter loads a batch of data to the target.
Use Cloudera ODBC driver for Impala for direct SQL and Impala SQL access to Apache Hadoop or Impala distributions. The driver transforms the SQL query of an application into the equivalent form in Impala SQL. Impala SQL is a subset of SQL-92. If an application is Impala-aware, you can configure the driver to process the query. The driver gets schema information from Impala to present to a SQL-based application. The driver transforms the queries and joins from SQL to Impala SQL.
Cloudera ODBC Driver for Impala is available for Microsoft Windows and Linux. The driver complies with the ODBC 3.52 data standard and includes Unicode and 32-bit and 64-bit support for high-performance computing environments on all platforms.
To import data from Impala, you can install the Cloudera ODBC driver for Impala drivers and configure an ODBC 32-bit connection in PowerCenter. You can configure a 64-bit connection in PowerCenter to read run-time data from the source and write it to the target. You can use Data Validation Option to run tests to validate data accuracy.
When you integrate Data Validation Option with Cloudera, the driver gets schema information from Impala. You can use Cloudera Impala as a source and target in Data Validation Option. You can then compare data in the source and target when you setup table pair and rules in the Data Validation Option. Data Validation Option communicates with PowerCenter to run the data. You can view test results in the Data Validation client.