Table of Contents

Search

  1. Preface
  2. Introduction to Test Data Management
  3. Test Data Manager
  4. Projects
  5. Policies
  6. Data Discovery
  7. Creating a Data Subset
  8. Performing a Data Masking Operation
  9. Data Masking Techniques and Parameters
  10. Plans and Workflows
  11. Monitor
  12. Reports
  13. ilmcmd
  14. Data Type Reference
  15. Data Type Reference for Hadoop

Avro and Parquet Data Sources

Avro and Parquet Data Sources

When you select an HDFS target connection, use Avro or Parquet resource formats to mask data and to move data in groups.
Avro and Parquet are semi-structured data sources. Apache Avro is a data serialization system in binary or other data formats and the Avro data is in a format that might not be directly human-readable. Apache Parquet is a columnar storage format that can be processed in a Hadoop environment and uses a record shredding and assembly algorithm. Use Avro and Parquet sources for single-level hierarchy files.
You can move data into the target with Avro and Parquet resource formats when you use a Hive, a Blaze, or a Spark engine.
If you use Parquet format, you cannot use null or repeated constraints. The table must not contain any null value in a column or a row. If there is any such column, you must restrict the column before data ingestion. You cannot run profiles on Avro and Parquet source formats.