Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings
  4. Sources
  5. Targets
  6. Transformations
  7. Data Preview
  8. Cluster Workflows
  9. Profiles
  10. Monitoring
  11. Hierarchical Data Processing
  12. Hierarchical Data Processing Configuration
  13. Hierarchical Data Processing with Schema Changes
  14. Intelligent Structure Models
  15. Stateful Computing
  16. Appendix A: Connections
  17. Appendix B: Data Type Reference
  18. Appendix C: Function Reference

Data Masking Transformation on the Spark Engine

Data Masking Transformation on the Spark Engine

The Data Masking transformation is supported with the following restrictions.
Mapping validation fails in the following situations:
  • The transformation is configured for repeatable expression masking.
  • The transformation is configured for unique repeatable substitution masking.
You can use the following masking techniques on the Spark engine:

    Credit Card

    Email

    Expression

    IP Address

    Key

    Phone

    Random

    SIN

    SSN

    Tokenization

    URL

    Random Substitution

    Repeatable Substitution

    Dependent with Random Substitution

    Dependent with Repeatable Substitution

To optimize performance of the Data Masking transformation, configure the following Spark engine configuration properties in the Hadoop connection:
spark.executor.cores
Indicates the number of cores that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.cores=1
spark.executor.instances
Indicates the number of instances that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.instances=1
spark.executor.memory
Indicates the amount of memory that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.memory=3G


Updated July 10, 2020