Use the Python transformation to execute Python code in a mapping that runs on the Spark or Databricks Spark engine.
The Python transformation provides an interface to define transformation functionality using Python code.
Python is a language that uses simple syntax, dynamic typing, and dynamic binding, making Python an ideal choice to increase productivity or to participate in rapid application development. When you use your Python code in a data engineering mapping, the Python code is embedded into the generated Scala code that the Spark or Databricks Spark engine runs to process large, diverse, and fast-changing data sets.
You can also use the Python transformation for machine learning. In the transformation, you can specify a resource file that contains a pre-trained model and load the pre-trained model in the Python code. For example, you can load a pre-trained model to classify input data or to create predictions.
Before you can use the Python transformation, configure the corresponding Spark advanced properties in the Hadoop connection or Databricks connection properties. Then, ensure that the worker nodes on the cluster contain an installation of Python.
For more information about installing Python, see the
Data Engineering Integration Guide
.
The Data Integration Service and the Blaze engine do not support the Python transformation.