Table of Contents

Search

  1. Preface
  2. Part 1: Version 10.5.1
  3. Part 2: Versions 10.5 - 10.5.0.1
  4. Part 3: Versions 10.4.1 - 10.4.1.3
  5. Part 4: Versions 10.4 - 10.4.0.2
  6. Part 5: Versions 10.2.2 - 10.2.2 HotFix 1
  7. Part 6: Version 10.2.1
  8. Part 7: Version 10.2 - 10.2 HotFix 2

Python Transformation

Python Transformation

If you upgrade to version 10.2.2, the Python transformation can process data more efficiently in Big Data Management.
To experience the improvements in performance, configure the following Spark advanced properties in the Hadoop connection:
infaspark.pythontx.exec
Required to run a Python transformation on the Spark engine for Data Engineering Integration. The location of the Python executable binary on the worker nodes in the Hadoop cluster.
For example, set to:
infaspark.pythontx.exec=/usr/bin/python3.4
If you use the installation of Python on the Data Integration Service machine, set the value to the Python executable binary in the Informatica installation directory on the Data Integration Service machine.
For example, set to:
infaspark.pythontx.exec=INFA_HOME/services/shared/spark/python/lib/python3.4
infaspark.pythontx.executorEnv.PYTHONHOME
Required to run a Python transformation on the Spark engine for Data Engineering Integration and Data Engineering Streaming. The location of the Python installation directory on the worker nodes in the Hadoop cluster.
For example, set to:
infaspark.pythontx.executorEnv.PYTHONHOME=/usr
If you use the installation of Python on the Data Integration Service machine, use the location of the Python installation directory on the Data Integration Service machine.
For example, set to:
infaspark.pythontx.executorEnv.PYTHONHOME= INFA_HOME/services/shared/spark/python/
After you configure the advanced properties, the Spark engine does not use Jep to run Python code in the Python transformation.
For information about installing Python, see the
Informatica Big Data Management 10.2.2 Integration Guide
.

0 COMMENTS

We’d like to hear from you!