Release Notes (10.4.1.3)

Release Notes (10.4.1.3)

Post-Installation Tasks for the Python Transformation

Post-Installation Tasks for the Python Transformation

To use the Python transformation, you must ensure that the worker nodes on the Hadoop cluster contain an installation of Python after you install or upgrade.
If you previously installed Python in the directory
<Informatica installation directory>/services/shared/spark/python
, you must reinstall Python.
Complete the different tasks depending on the product that you use.

Installing Python for Data Engineering Integration

To use the Python transformation in a mapping, the worker nodes on the cluster must contain a uniform installation of Python. You can ensure that the installation is uniform in one of the following ways:
Verify that the Python installation exists.
Verify that all worker nodes on the cluster contain an installation of Python in the same directory, such as
/usr/lib/python
, and that each Python installation contains all required modules.
Additionally, verify that the following Spark advanced property in the Hadoop connection is configured based on the directory that stores the Python installation:
infaspark.pythontx.executorEnv.PYTHONHOME
Install Python.
Install Python on every Data Integration Service machine. You can create a custom installation of Python that contains specific modules that you can reference in the Python code. When you run mappings, the Python installation is propagated to the worker nodes on the cluster.
If you choose to install Python on the Data Integration Service machines, complete the following tasks:
  1. Install Python.
  2. Optionally, install any third-party libraries such as numpy, scikit-learn, and cv2. You can access the third-party libraries in the Python transformation.
  3. Copy the Python installation folder to the following location on the Data Integration Service machine:
    <Informatica installation directory>/services/shared/spark/python
    If the Data Integration Service machine already contains an installation of Python, you can copy the existing Python installation to this location.
Changes take effect after you recycle the Data Integration Service.

Installing Python for Data Engineering Streaming

To use the Python transformation in a streaming mapping, you must install Python and the Jep package. Because you must install Jep, the Python version that you use must be compatible with Jep. You can use one of the following versions of Python:

    2.7

    3.3

    3.4

    3.5

    3.6

To install Python and Jep, complete the following tasks:
  1. Install Python with the
    --enable-shared
    option to ensure that shared libraries are accessible by Jep.
  2. Install Jep. To install Jep, consider the following installation options:
    • Run
      pip install jep
      . Use this option if Python is installed with the pip package.
    • Configure the Jep binaries. Ensure that
      jep.jar
      can be accessed by Java classloaders, the shared Jep library can be accessed by Java, and Jep Python files can be accessed by Python.
  3. Optionally, install any third-party libraries such as numpy, scikit-learn, and cv2. You can access the third-party libraries in the Python transformation.
  4. Copy the Python installation folder to the following location on the Data Integration Service machine:
    <Informatica installation directory>/services/shared/spark/python
    If the Data Integration Service machine already contains an installation of Python, you can copy the existing Python installation to this location.
Changes take effect after you recycle the Data Integration Service.