Table of Contents

Search

  1. Preface
  2. Introduction to Test Data Management
  3. Test Data Manager
  4. Projects
  5. Policies
  6. Data Discovery
  7. Creating a Data Subset
  8. Performing a Data Masking Operation
  9. Data Masking Techniques and Parameters
  10. Data Generation
  11. Data Generation Techniques and Parameters
  12. Working with Test Data Warehouse
  13. Analyzing Test Data with Data Coverage
  14. Plans and Workflows
  15. Monitor
  16. Reports
  17. ilmcmd
  18. tdwcmd
  19. tdwquery
  20. Data Type Reference
  21. Data Type Reference for Test Data Warehouse
  22. Data Type Reference for Hadoop
  23. Glossary

Avro and Parquet Data Sources

Avro and Parquet Data Sources

When you select an HDFS target connection, use Avro or Parquet resource formats to mask data and to move data in groups.
Avro and Parquet are semi-structured data sources. Apache Avro is a data serialization system in binary or other data formats and the Avro data is in a format that might not be directly human-readable. Apache Parquet is a columnar storage format that can be processed in a Hadoop environment and uses a record shredding and assembly algorithm. Use Avro and Parquet sources for single-level hierarchy files.
You can move data into the target with Avro and Parquet resource formats when you use a Hive, a Blaze, or a Spark engine.
If you use Parquet format, you cannot use null or repeated constraints. The table must not contain any null value in a column or a row. If there is any such column, you must restrict the column before data ingestion. You cannot run profiles on Avro and Parquet source formats.