Table of Contents

Search

  1. Preface
  2. Introduction to Test Data Management
  3. Test Data Manager
  4. Projects
  5. Policies
  6. Data Discovery
  7. Data Subset
  8. Data Masking
  9. Data Masking Techniques and Parameters
  10. Data Generation
  11. Data Generation Techniques and Parameters
  12. Data Sets
  13. Plans and Workflows
  14. Monitor
  15. Reports
  16. ilmcmd
  17. tdwcmd
  18. tdwquery
  19. Data Type Reference
  20. Data Type Reference for Test Data Warehouse
  21. Data Type Reference for Hadoop
  22. Glossary

Hadoop Data Sources

Hadoop Data Sources

You can perform data movement and data masking operations on Hadoop data sources.
You can use the following Hadoop connections: Hive and Hadoop Distributed File System (HDFS). You can import Hive and HDFS connections in Test Data Manager. In a Hadoop plan, you can select the Hadoop connections as source, target, or both.
In Hive database schema, there might be some temporary junk tables that are created while running a mapping. The following sample formats are the junk tables in a Hive database schema:
w1413372528_infa_generatedsource_1_alpha_check
w1413372528_write_employee1_group_cast_alpha_check
When you import data sources from a Hive database and if there are some temporary junk tables present in the schema, you must ensure that you do not select those tables.
You can create a Hadoop plan to move data from Hadoop sources, flat files, or relational databases such as Oracle, DB2, ODBC-Sybase, and ODBC-Microsoft SQL Server into Hive or HDFS targets. You can also create a Hadoop plan when you want to move data between Hadoop sources and targets. If the source is HDFS, you can move data to a Hive or an HDFS target. If the source is Hive, you can move data to a Hive or an HDFS target.
To run a Hadoop plan, TDM uses Data Integration Service that supports the Hadoop pushdown optimization feature. When you generate and run the Hadoop plan, TDM generates the mappings and submits the mappings to the Data Integration Service. The Data Integration Service applies the pushdown optimization and runs the mappings on the Hadoop cluster to improve the performance.
You cannot perform data subset or data generation operations for Hadoop sources and targets.