Table of Contents

Search

  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Prepare a Python Installation

Prepare a Python Installation

If you want to use the Python transformation, you must ensure that the worker nodes on the Hadoop cluster contain an installation of Python. You must complete different tasks depending on the product that you use.

Installing Python for Data Engineering Integration

To use the Python transformation in a mapping, the worker nodes on the cluster must contain a uniform installation of Python. You can ensure that the installation is uniform in one of the following ways:
Verify that the Python installation exists.
Verify that all worker nodes on the cluster contain an installation of Python in the same directory, such as
/usr/lib/python
, and that each Python installation contains all required modules. You do not reinstall Python, but you must reconfigure the following Spark advanced property in the Hadoop connection:
infaspark.pythontx.executorEnv.PYTHONHOME
Install Python.
Install Python on every Data Integration Service machine. You can create a custom installation of Python that contains specific modules that you can reference in the Python code. When you run mappings, the Python installation is propagated to the worker nodes on the cluster.
If you choose to install Python on the Data Integration Service machines, complete the following tasks:
  1. Install Python.
  2. Optionally, install any third-party libraries such as numpy, scikit-learn, and cv2. You can access the third-party libraries in the Python transformation.
  3. Copy the Python installation folder to the following location on the Data Integration Service machine:
    <Informatica installation directory>/services/shared/spark/python
    If the Data Integration Service machine already contains an installation of Python, you can copy the existing Python installation to the above location.
Changes take effect after you recycle the Data Integration Service.

Installing Python for Data Engineering Streaming

To use the Python transformation in Data Engineering Streaming, you must install Python and the Jep package. Because you must install Jep, the Python version that you use must be compatible with Jep. To learn about supported version, review the Product Availability Matrix. You can find the Product Availability Matrix on Informatica Network: https://network.informatica.com/community/informatica-network/product-availability-matrices/overview
To install Python and Jep, complete the following tasks:
  1. Install Python with the
    --enable-shared
    option to ensure that shared libraries are accessible by Jep.
  2. Install Jep. To install Jep, consider the following installation options:
    • Run
      pip install jep
      . Use this option if Python is installed with the pip package.
    • Configure the Jep binaries. Ensure that
      jep.jar
      can be accessed by Java classloaders, the shared Jep library can be accessed by Java, and Jep Python files can be accessed by Python.
  3. Optionally, install any third-party libraries such as numpy, scikit-learn, and cv2. You can access the third-party libraries in the Python transformation.
  4. Copy the Python installation folder to the following location on the Data Integration Service machine:
    <Informatica installation directory>/services/shared/spark/python
    If the Data Integration Service machine already contains an installation of Python, you can copy the existing Python installation to the above location.
Changes take effect after you recycle the Data Integration Service.

0 COMMENTS

We’d like to hear from you!