Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Big Data Management
  3. Mappings in the Hadoop Environment
  4. Mapping Sources in the Hadoop Environment
  5. Mapping Targets in the Hadoop Environment
  6. Mapping Transformations in the Hadoop Environment
  7. Processing Hierarchical Data on the Spark Engine
  8. Configuring Transformations to Process Hierarchical Data
  9. Processing Unstructured and Semi-structured Data with an Intelligent Structure Model
  10. Stateful Computing on the Spark Engine
  11. Monitoring Mappings in the Hadoop Environment
  12. Mappings in the Native Environment
  13. Profiles
  14. Native Environment Optimization
  15. Cluster Workflows
  16. Connections
  17. Data Type Reference
  18. Function Reference
  19. Parameter Reference

Big Data Management User Guide

Big Data Management User Guide

Spark Engine Advanced Properties

Spark Engine Advanced Properties

Spark advanced properties are a list of advanced or custom properties that are unique to the Spark engine. Each property contains a name and a value. You can add or edit advanced properties.
To edit the property in the text box, use the following format with &: to separate each name-value pair:
<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]
Configure the following properties in the
Advanced Properties
of the Spark configuration section:
spark.scheduler.maxRegisteredResourcesWaitingTime
The number of milliseconds to wait for resources to register before scheduling a task. Default is 30000. Decrease the value to reduce delays before starting the Spark job execution. Required to improve performance for mappings on the Spark engine.
Set to 15000.
For example,
spark.scheduler.maxRegisteredResourcesWaitingTime=15000
spark.scheduler.minRegisteredResourcesRatio
The minimum ratio of registered resources to acquire before task scheduling begins. Default is 0.8. Decrease the value to reduce any delay before starting the Spark job execution. Required to improve performance for mappings on the Spark engine.
Set to: 0.5
For example,
spark.scheduler.minRegisteredResourcesRatio=0.5
spark.shuffle.encryption.enabled
Enables encrypted communication when authentication is enabled. Required for Spark encryption.
Set to TRUE.
For example,
spark.shuffle.encryption.enabled=TRUE
spark.authenticate
Enables authentication for the Spark service on Hadoop. Required for Spark encryption.
Set to TRUE.
For example,
spark.authenticate=TRUE
spark.authenticate.enableSaslEncryption
Enables encrypted communication when SASL authentication is enabled. Required if Spark encryption uses SASL authentication.
Set to TRUE.
For example,
spark.authenticate.enableSaslEncryption=TRUE
spark.authenticate.sasl.encryption.aes.enabled
Enables AES support when SASL authentication is enabled. Required if Spark encryption uses SASL authentication.
Set to TRUE.
For example,
spark.authenticate.sasl.encryption.aes.enabled=TRUE
infaspark.pythontx.executorEnv.LD_PRELOAD
The location of the Python shared library in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.executorEnv.LD_PRELOAD= <Informatica installation directory>/services/shared/spark/python/lib/libpython3.6m.so
infaspark.pythontx.submit.lib.JEP_HOME
The location of the Jep package in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.submit.lib.JEP_HOME= <Informatica installation directory>/services/shared/spark/python/lib/python3.6/site-packages/jep/
infaspark.executor.extraJavaOptions
List of extra Java options for the Spark executor. Required for streaming mappings to read from or write to a Kafka cluster that uses Kerberos authentication.
For example, set to:
infaspark.executor.extraJavaOptions= -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to krb5.conf file>/krb5.conf -Djava.security.auth.login.config=/<path to jAAS config>/kafka_client_jaas.config
To configure the property for a specific user, you can include the following lines of code:
infaspark.executor.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -Djava.security.krb5.conf=/etc/krb5.conf
infaspark.driver.cluster.mode.extraJavaOptions
List of extra Java options for the Spark driver that runs inside the cluster. Required for streaming mappings to read from or write to a Kafka cluster that uses Kerberos authentication.
For example, set to:
infaspark.driver.cluster.mode.extraJavaOptions= -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to keytab file>/krb5.conf -Djava.security.auth.login.config=<path to jaas config>/kafka_client_jaas.config
To configure the property for a specific user, you can include the following lines of code:
infaspark.driver.cluster.mode.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -Djava.security.krb5.conf=/etc/krb5.conf

0 COMMENTS

We’d like to hear from you!