Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Data Masking Transformation on the Spark Engine

Data Masking Transformation on the Spark Engine

The Data Masking transformation is supported with the following restrictions.
Mapping validation fails in the following situations:
  • The transformation is configured for repeatable expression masking.
  • The transformation is configured for unique repeatable substitution masking.
You can use the following masking techniques on this engine:

    Credit Card

    Email

    Expression

    IP Address

    Key

    Phone

    Random

    SIN

    SSN

    Tokenization

    URL

    Random Substitution

    Repeatable Substitution

    Dependent with Random Substitution

    Dependent with Repeatable Substitution

To optimize performance of the Data Masking transformation, configure the following Spark engine configuration properties in the Hadoop connection:
spark.executor.cores
Indicates the number of cores that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.cores=1
spark.executor.instances
Indicates the number of instances that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.instances=1
spark.executor.memory
Indicates the amount of memory that each executor process uses to run tasklets on the Spark engine.
Set to:
spark.executor.memory=3G


Updated September 28, 2020