Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Use Case: Operationalize a Pre-Trained Model

Use Case: Operationalize a Pre-Trained Model

You work for a pharmaceutical company and you are studying data on flower formation in foxglove in your research to provide a better treatment for heart diseases. You want to find out whether the common foxglove
Digitalis purpurea
or the woolly foxglove
Digitalis lanata
can provide a better prognosis for the development of a disease.
To perform your research, you must classify data on the length and width of the flower sepals and petals by flower species. To classify the data, you developed a pre-trained model outside of the Developer tool.
You operationalize the pre-trained model in the Developer tool. In the Developer tool, you create a mapping that contains a passive Python transformation. In the Python transformation, you list the pre-trained model as a resource file. You write a Python script that accesses the pre-trained model. You pass the data on flower sepals and petals to the Python transformation to classify the data by foxglove species.
The following image shows the mapping that you might create:
This image shows a mapping in the Developer tool. The mapping contains a Read transformation, a Python transformation, and a Write transformation. The Read transformation contains the following ports: sepal_length, sepal_width, petal_length, petal_width, and true_class. The ports are linked to the downstream Python transformation. The ports are input ports in the Python transformation. Output ports are configured in the Python transformation based on the input ports. The output ports in the Python transformation are linked to the downstream Write transformation.
The passive Python transformation uses the following components:
Resource File
Specify the path of the pre-trained model as the resource file.
For example, you might use a pre-trained model that is stored in the file
foxgloveDataMLmodel.pkl
in the following path:
/data/home/dtmqa/data/foxgloveDataMLmodel.pkl
Python Code
Specify the Python code on the Pre-Input and On Input tabs.
Use the Pre-Input tab to import libraries, load the resource file, and initialize variables.
For example, you might enter the following code on the Pre-Input tab:
from sklearn import svm from sklearn.externals import joblib import numpy as np clf = joblib.load(resourceFileArrays[0]) classes = ['common', 'woolly']
On the On Input tab, define how the Python transformation uses the pre-trained model to evaluate each row of data.
For example, you might enter the following code on the On Input tab:
input = [sepal_length, sepal_width, petal_length, petal_width] input = np.array(input).reshape(1,-1) pred = clf.predict(input) predicted_class = classes[pred[0]] sepal_length_out = sepal_length sepal_width_out = sepal_width petal_length_out = petal_length petal_width_out = petal_width true_class_out = true_class
The Python transformation processes the data in the input ports according to the Python code and writes the data to the output ports.


Updated September 28, 2020