You can use the following Hadoop connections: Hive and Hadoop Distributed File System (HDFS). You can create Hive and HDFS connections in Test Data Manager, and import the Hadoop data sources in to a project. In a Hadoop plan, you can select the Hadoop connections as source, target, or both.
The Hive database schema might contain temporary junk tables that are created when you run a mapping. The following sample formats are the junk tables in a Hive database schema:
Ensure that you do not select any temporary tables when you import data sources.
You can create a Hadoop plan to move data from Hadoop sources, flat files, or relational databases such as Oracle, DB2, ODBC-Sybase, and ODBC-Microsoft SQL Server into Hive or HDFS targets. You can also create a Hadoop plan when you want to move data between Hadoop sources and targets. If the source is HDFS, you can move data to a Hive or an HDFS target. If the source is Hive, you can move data to a Hive or an HDFS target.
To run a Hadoop plan, TDM uses Data Integration Service that is configured for pushdown optimization. When you generate and run the Hadoop plan, TDM generates the mappings and the Data Integration Service pushes the mappings to the Hadoop cluster to improve the performance. You can use a Blaze or a Hive execution engine to run Hadoop mappings. When you select an HDFS target connection, you can use Avro or Parquet resource formats to mask data.
You cannot perform data subset or data generation operations for Hadoop sources and targets.