Table of Contents

Search

  1. Preface
  2. Introduction to Big Data Management Administration
  3. Big Data Management Engines
  4. Authentication and Authorization
  5. Running Mappings on a Cluster with Kerberos Authentication
  6. Configuring Access to an SSL/TLS-Enabled Cluster
  7. Cluster Configuration
  8. Cluster Configuration Privileges and Permissions
  9. Cloud Provisioning Configuration
  10. Queuing
  11. Tuning for Big Data Processing
  12. Connections
  13. Multiple Blaze Instances on a Cluster

Big Data Management Administrator Guide

Big Data Management Administrator Guide

Spark Engine Advanced Properties

Spark Engine Advanced Properties

Spark advanced properties are a list of advanced or custom properties that are unique to the Spark engine. Each property contains a name and a value. You can add or edit advanced properties.
To edit the property in the text box, use the following format with &: to separate each name-value pair:
<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]
Configure the following properties in the
Advanced Properties
of the Spark configuration section:
spark.scheduler.maxRegisteredResourcesWaitingTime
The number of milliseconds to wait for resources to register before scheduling a task. Default is 30000. Decrease the value to reduce delays before starting the Spark job execution. Required to improve performance for mappings on the Spark engine.
Set to 15000.
For example,
spark.scheduler.maxRegisteredResourcesWaitingTime=15000
spark.scheduler.minRegisteredResourcesRatio
The minimum ratio of registered resources to acquire before task scheduling begins. Default is 0.8. Decrease the value to reduce any delay before starting the Spark job execution. Required to improve performance for mappings on the Spark engine.
Set to: 0.5
For example,
spark.scheduler.minRegisteredResourcesRatio=0.5
spark.shuffle.encryption.enabled
Enables encrypted communication when authentication is enabled. Required for Spark encryption.
Set to TRUE.
For example,
spark.shuffle.encryption.enabled=TRUE
spark.authenticate
Enables authentication for the Spark service on Hadoop. Required for Spark encryption.
Set to TRUE.
For example,
spark.authenticate=TRUE
spark.authenticate.enableSaslEncryption
Enables encrypted communication when SASL authentication is enabled. Required if Spark encryption uses SASL authentication.
Set to TRUE.
For example,
spark.authenticate.enableSaslEncryption=TRUE
spark.authenticate.sasl.encryption.aes.enabled
Enables AES support when SASL authentication is enabled. Required if Spark encryption uses SASL authentication.
Set to TRUE.
For example,
spark.authenticate.sasl.encryption.aes.enabled=TRUE
infaspark.pythontx.executorEnv.LD_PRELOAD
The location of the Python shared library in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.executorEnv.LD_PRELOAD= <Informatica installation directory>/services/shared/spark/python/lib/libpython3.6m.so
infaspark.pythontx.submit.lib.JEP_HOME
The location of the Jep package in the Python installation folder on the Data Integration Service machine. Required to run a Python transformation on the Spark engine.
For example, set to:
infaspark.pythontx.submit.lib.JEP_HOME= <Informatica installation directory>/services/shared/spark/python/lib/python3.6/site-packages/jep/
infaspark.executor.extraJavaOptions
List of extra Java options for the Spark executor. Required for streaming mappings to read from or write to a Kafka cluster that uses Kerberos authentication.
For example, set to:
infaspark.executor.extraJavaOptions= -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to krb5.conf file>/krb5.conf -Djava.security.auth.login.config=/<path to jAAS config>/kafka_client_jaas.config
To configure the property for a specific user, you can include the following lines of code:
infaspark.executor.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -Djava.security.krb5.conf=/etc/krb5.conf
infaspark.driver.cluster.mode.extraJavaOptions
List of extra Java options for the Spark driver that runs inside the cluster. Required for streaming mappings to read from or write to a Kafka cluster that uses Kerberos authentication.
For example, set to:
infaspark.driver.cluster.mode.extraJavaOptions= -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.krb5.conf=/<path to keytab file>/krb5.conf -Djava.security.auth.login.config=<path to jaas config>/kafka_client_jaas.config
To configure the property for a specific user, you can include the following lines of code:
infaspark.driver.cluster.mode.extraJavaOptions = -Djava.security.egd=file:/dev/./urandom -XX:MaxMetaspaceSize=256M -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -Djava.security.krb5.conf=/etc/krb5.conf

0 COMMENTS

We’d like to hear from you!