User Guide

Back Next

Data Masking Transformation on the Spark Engine

The Data Masking transformation is supported with the following restrictions.

Mapping validation fails in the following situations:

The transformation is configured for repeatable expression masking.

The transformation is configured for unique repeatable substitution masking.

You can use the following masking techniques on the Spark engine:

Credit Card

Expression

IP Address

Key

Phone

Random

SIN

SSN

Tokenization

URL

Random Substitution

Repeatable Substitution

Dependent with Random Substitution

Dependent with Repeatable Substitution

To optimize performance of the Data Masking transformation, configure the following Spark engine configuration properties in the Hadoop connection:

spark.executor.cores: Indicates the number of cores that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.cores=1

spark.executor.instances: Indicates the number of instances that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.instances=1

spark.executor.memory: Indicates the amount of memory that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.memory=3G