When you select an HDFS target connection, use Avro or Parquet resource formats to mask data and to move data in groups.
Avro and Parquet are semi-structured data sources. Apache Avro is a data serialization system in binary or other data formats and the Avro data is in a format that might not be directly human-readable. Apache Parquet is a columnar storage format that can be processed in a Hadoop environment and uses a record shredding and assembly algorithm. Use Avro and Parquet sources for single-level hierarchy files.
You can move data into the target with Avro and Parquet resource formats when you use a Hive, a Blaze, or a Spark engine.
If you use Parquet format, you cannot use null or repeated constraints. The table must not contain any null value in a column or a row. If there is any such column, you must restrict the column before data ingestion. You cannot run profiles on Avro and Parquet source formats.