Preface
Introduction to Test Data Management
- Test Data Management Overview
- Test Data Management Use Cases
- TDM Architecture
- TDM Process
- TDM Example
- Using Advanced Parameterization in Test Data Manager
Test Data Manager
- Test Data Manager Overview
- Test Data Manager User Interface
- Overview View
- Policies View
- Data Sets View
- Projects View
- Monitor View
- Parameters View
- Administrator View
- Expression Builder
- Logging In to Test Data Manager
Projects
- Projects Overview
- Project Components
- Parameters
  - Rules and Guidelines for Parameters
- Project Logs
- Data Masking Transformation Session Properties
- Project Management
- Data Sources
- Delete a Table
  - Deleting a Table
- Project Permission and Security
  - Project Permissions
  - Updating User and Group Security
Policies
- Policies Overview
- Policies View
- Policies Task Flow
- Rules
- Data Domains
- Policy Packs
- Import and Export
- Linking Business Glossary Terms to Global Objects
  - Linking a Business Term to an Object
  - Deleting a Business Term Link to an Object
- Policy Management
Data Discovery
- Data Discovery Overview
- Data Discovery Sources
  - Rules and Guidelines for Data Discovery Sources
- Discover View
- Column Properties
- Data Discovery Task Flow
- Primary Key Discovery
  - Primary Key Profile Options
- Entity Discovery
  - Entity Profile Options
- Data Domain Discovery
- Column Profile
  - Inferred Rules
  - Column Profile Options
- Profile Management
- Profile Import
  - Importing a Profile
- Apply the Results
- Project Tables
Creating a Data Subset
- Data Subset Overview
- Data Subset Process Flow
- Data Subset Components
  - Entities
    - Entity Views
  - Groups
    - Group Example
- Creating an Entity
- Data Integrity Options in a Data Subset Plan
- Creating a Group
- Applying Criteria to an Element or Attribute
- Editing a Data Subset Component
- Exporting a Data Subset Component
- Importing a Data Subset Component
- Copying a Data Subset Component
- Deleting a Data Subset Component
- Creating a Data Subset
- Example - Data Subset for XSD Data Sources
Performing a Data Masking Operation
- Data Masking Overview
- Data Masking Task Flow
- Data Masking Rules
- Creating and Assigning Data Masking Rules
- Modifying Data Masking Rules and Assignments
- Performing a Data Masking Operation
- Data Masking Components
  - Masking Components in PowerCenter
    - Mapplet Export
Data Masking Techniques and Parameters
- Data Masking Techniques and Parameters Overview
- Data Masking Techniques
- Data Masking Parameters
  - Repeatable Output
    - Seed
  - Exception Handling
- Custom Masking
  - Custom Masking Parameters
- Advanced Masking
  - Advanced Masking Parameters
  - Advanced Masking Example
- Credit Card Masking
  - Credit Card Masking Parameters
- Email Masking
  - Email Masking Parameters
- Encryption Masking
  - Encryption Masking Parameters
- Expression Masking
  - Expression Masking Parameters
  - Rules and Guidelines for Expression Masking
- IP Address Masking
- Key Masking
  - Mask Format
  - Source String Characters
  - Result String Replacement Characters
  - Case Insensitive
  - Delimited String Masking
  - Date Key Masking
  - Numeric Key Masking Parameters
  - String Key Masking Parameters
- Nullification Masking
- Phone Masking
- Random Masking
  - Range Masking
  - Blurring
  - Mask Format
  - Source String Characters
  - Result String Replacement Characters
  - Date Random Masking Parameters
  - Numeric Random Masking Parameters
  - String Random Masking Parameters
- Shuffle Masking
  - Shuffle Masking Parameters
  - Rules and Guidelines for Shuffle Masking
- SIN Masking
- SSN Masking
- Substitution Masking
  - Substitution Masking Parameters
- URL Masking
- Name Substitution Example
  - Add a Dictionary in Test Data Manager
    - Adding a Relational Dictionary
    - Adding a Flat File Dictionary
  - Creating the Substitution Rule
  - Creating the Advanced Masking Rule
- Shuffle Address Example
  - Creating the Shuffle Rule
  - Create the Advanced Masking Rule
Data Generation
- Data Generation Overview
- Data Generation Components
- Rules and Guidelines for Entities
- Data Generation Task Flow
- Data Generation Rule Types
- Default Settings
- Define Default Generation Rules
- Standard Generation Rules
- Custom Generation Rules
  - Creating a Custom Generation Rule
- Ad Hoc Generation Rules
  - Creating an Ad Hoc Generation Rule
  - Editing an Ad Hoc Generation Rule
- Advanced Generation Rules
  - Advanced Generation Rule Example
  - Creating an Advanced Generation Rule
- Conditional Constraints
  - Conditional Constraints and Data Conflicts
- Data Generation Rule Assignments
- Data Generation for XSD Sources
  - Data Generation Task Flow for XSD Sources
  - Applying Data Generation Rules to XML Elements and Attributes
- Data Generation Plans and Workflows
Data Generation Techniques and Parameters
- Data Generation Techniques and Parameters Overview
- Data Generation Techniques
- Data Generation Parameters
  - Exception Test Data
    - Exception Test Data Parameters
- Custom Generation
  - Custom Generation Parameters
- Advanced Generation
  - Advanced Generation Parameters
- Dictionary Generation
  - Dictionary Generation Parameters
- Effective Dates Generation
  - Effective Dates Generation Example
  - Effective Dates Generation Parameters
- Expression Generation
  - Expression Generation Parameters
- Random Generation
- Credit Card Number Generation
  - Issuer Identification Number
  - Credit Card Generation Parameters
- Reference Lookup Generation
  - Reference Lookup Generation Parameters
- Sequence Generation
  - Date Sequence Generation Parameters
  - Numeric Sequence Generation Parameters
- Set of Values Generation
  - Set of Values Generation Parameters
- Conditional Generation
  - Conditional Generation Parameters
Working with Test Data Warehouse
- Test Data Warehouse Overview
- Test Data Warehouse Process
- Data Sets
  - Data Set Tags
- Test Data Management Self-Service Portal
- Creating a Data Set
- Reset a Data Set
- Editing the Metadata of a Data Set
- Publishing a Data Set to the Self-Service Portal
- Deleting a Data Set
- Related Data Sets
- Data Set Permissions
  - Editing Data Set Permission
- Locking and Unlocking a Data Set
- Monitor a Data Set Job
- View and Manage Data in a Data Set
Analyzing Test Data with Data Coverage
- Data Coverage Analysis Overview
- Data Coverage Process
- Creating a Data Coverage Task
- Data Coverage Task Columns
- Data Coverage Analysis Page
- Editing a Data Coverage Task
- Marking a Cell as Invalid
- Updating Data Across Cells
- User Input in Fill Cell Jobs
- Data Coverage Analysis Example
  - Tables in the Data Set
  - Analysis for Data Coverage
Plans and Workflows
- Plans and Workflows Overview
  - Plans and Workflows Task List
- Workflow Connections
- Plan Components
- Pre Workflow and Post Workflow Parameters
- Target Pre and Post SQL Statements
- Persist Mapping
- Plan Settings
- Masking Components
- Subset Components
- Generation Components
- Hadoop Components
- Component Criteria
  - Filtering Data Subset Components
  - Disabling Masking for a Column
- Source Settings
- Using a List File
- Plan Management
- Workflow Generation
- Parameter Files in Test Data Manager
  - Creating a Parameter File
- Executing a Workflow
- Workflow Executions View
  - Workflow Tasks
  - Workflow Properties Panel
    - Workflow Sessions Tab
    - Session Details
Monitor
- Monitor Overview
- Jobs
  - Job Details
- Monitor Tasks
- Logs
  - Severity Levels
  - Viewing the Log Messages
- Sessions
- Monitoring for Hadoop
Reports
- Reports Overview
- Audit Trail Report
  - Running an Audit Trail Report
- Data Masking Report
  - Running the Data Masking Report
- Plan Audit Report
  - Running a Plan Audit Report
- Plan Detail Report
  - Running the Plan Detail Report
- Row Count Report
  - Running the Row Count Report
ilmcmd
- ilmcmd Overview
- Configuring ilmcmd
- Running ilmcmd
- Entering Options and Arguments
- Syntax Notation
- Delete
  - Delete Examples
- Export
  - Export Examples
- Import
  - Import Examples
- Search
  - Search Examples
- Workflow
  - Workflow Examples
- Reset
- ListPlans
- TDWPlanGenerate
- TDWPlanExecute
- TDWPlanGenExe
tdwcmd
- tdwcmd Overview
- Running tdwcmd
- Entering Options and Arguments
- Syntax Notation
- List
  - List Examples
tdwquery
- tdwquery Overview
- Configuring tdwquery
- Running tdwquery
- Select Clause
Data Type Reference
- Data Type Reference Overview
- Oracle
- Microsoft SQL Server
- Microsoft Azure SQL
- Microsoft Azure SQL Data Warehouse
- Amazon Redshift
- DB2 for Linux, UNIX, and Windows
- Sybase ASE
- HDFS
- Hive
- Hadoop HDFS
- MySQL
- Flat File
- Sequential Single Record
- Sequential Multi Record
- VSAM Flat/Single Record
- VSAM Multi Record
- DB2 for z/OS
- DB2 for IOS
- IMS Flat/Single Record
- IMS Multi Record
- Sybase IQ
- Netezza
- Teradata
- Cassandra
- MongoDB
- PostgreSQL
Data Type Reference for Test Data Warehouse
- Data Type Reference for Test Data Warehouse Overview
- Oracle
- Microsoft SQL Server
- Microsoft Azure SQL
- Microsoft Azure SQL Data Warehouse
- Amazon Redshift
- DB2 for Linux, UNIX, and Windows
- DB2 for z/OS
- IMS Flat/Single Record
- IMS Multi Record
- Sequential Single Record
- Sequential Multi Record
- VSAM Flat/Single Record
- VSAM Multi Record
- Sybase ASE
- Teradata
- MongoDB
- Cassandra
- PostgreSQL
Data Type Reference for Hadoop
- Data Type Reference for Hadoop Overview
- Oracle
- Microsoft SQL Server
- DB2 for Linux, UNIX, and Windows
- Sybase ASE
- Flat File
- Hive
- HDFS
- Hadoop HDFS
- JDBC Connection
Glossary
- Glossary of Terms

User Guide

10.4.1
- 10.5.9
- 10.5.8
- 10.5.7
- 10.5.6
- 10.5.3
- 10.5.2
- 10.5.10
- 10.5.1
- 10.5
- 10.4.0

Back Next

Hive and HDFS Data Sources

You can perform data movement, data domain discovery, and data masking operations on Hive and Hadoop Distributed File System (HDFS) data sources.

You can use Hive and HDFS connections in a Hadoop plan. When you use a Hive or an HDFS connection, TDM uses the Data Integration Service to run the mappings in the Hadoop cluster.

You can create Hive and HDFS connections in Test Data Manager, and import the Hadoop data sources in to a project. In a Hadoop plan, you can select Hive and HDFS connections as source, target, or both.

You must configure a cluster configuration in the Administrator tool before you perform TDM operations on Hive and HDFS sources. A cluster configuration is an object that contains configuration information about the Hadoop cluster. The cluster configuration enables the Data Integration Service to push mapping logic to the Hadoop environment.

The Hive database schema might contain temporary junk tables that are created when you run a mapping. The following sample formats are the junk tables in a Hive database schema:

w1413372528_infa_generatedsource_1_alpha_check

w1413372528_write_employee1_group_cast_alpha_check

Ensure that you do not select any temporary tables when you import data sources.

You can create a Hadoop plan to move data from Hive, HDFS, flat files, or relational databases such as Oracle, DB2, ODBC-Sybase, and ODBC-Microsoft SQL Server into Hive or HDFS targets. You can also create a Hadoop plan when you want to move data between Hive and HDFS sources and targets. If the source is HDFS, you can move data to a Hive or an HDFS target. If the source is Hive, you can move data to a Hive or an HDFS target. You can extract data from Hive and HDFS to a flat file in a Hadoop plan.

To run a Hadoop plan, TDM uses Data Integration Service that is configured for pushdown optimization. When you generate and run the Hadoop plan, TDM generates the mappings and the Data Integration Service pushes the mappings to the Hadoop cluster to improve the performance. You can use a Blazeexecution engine to run Hadoop mappings. When you select an HDFS target connection, you can use Avro or Parquet resource formats to mask data.

You cannot perform data subset or data generation operations on Hive and HDFS sources and targets.

Hive Inplace Masking

You can perform an inplace masking operation on Hive data sources. Use a Spark execution engine to run the mappings in the cluster. When you use a Spark engine, you can perform shuffle and substitution masking if you use the JDBC connection type to create the dictionary connection.

Before you perform an inplace masking operation on Hive data sources, you must take a backup of source tables. If the data movement from staging to source tables fails, TDM truncates source tables and there might be loss of data.

Data Sources

Avro and Parquet Data Sources

Execution Engines

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal