Table of Contents

Search

  1. Preface
  2. Introduction to Informatica Data Engineering Integration
  3. Mappings
  4. Mapping Optimization
  5. Sources
  6. Targets
  7. Transformations
  8. Python Transformation
  9. Data Preview
  10. Cluster Workflows
  11. Profiles
  12. Monitoring
  13. Hierarchical Data Processing
  14. Hierarchical Data Processing Configuration
  15. Hierarchical Data Processing with Schema Changes
  16. Intelligent Structure Models
  17. Blockchain
  18. Stateful Computing
  19. Appendix A: Connections Reference
  20. Appendix B: Data Type Reference
  21. Appendix C: Function Reference

Example: Use Partitions to Find the Highest Salary

Example: Use Partitions to Find the Highest Salary

You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. You want to use the information to better personalize the organization's wellness program.
You can use the Python transformation to complete the first part of your project to determine which employee earns the highest salary in their department.
The following table shows the data that your organization might collect:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
HR
1
Jane Smith
500
2/16/2010
R&D
2
Ellioth Consar
150
3/29/2018
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019
HR
1
Chelsea Blanch
389
9/3/2018
R&D
1
Samuel Coin
10
1/26/2005
To use the Python transformation for the task, complete the following steps:
Step 1. Add a Python transformation to the mapping.
Create an active Python transformation.
Step 2. Pass data to the Python transformation.
Pass the following ports from upstream transformations in the mapping to the Python transformation:
  • DepartmentName
  • DepartmentID
  • EmployeeName
  • SalaryIndex
  • EmployeeSince
Step 3. Partition the data by department.
Partition the data by department to track the highest salary within each department. To partition the data by department, select the input port
DepartmentID
as the partition key.
Step 4. Create output ports.
On the Ports tab in the Python transformation, create the following output ports to pass data to downstream transformations:
  • DepartmentName_out
  • DepartmentID_out
  • EmployeeName_out
  • SalaryIndex_out
  • EmployeeSince_out
Step 5. Initialize a map.
Declare a map variable
outputmap
to associate each department ID with the employee in the department who has the highest salary.
Add the following code on the Pre-Input tab:
print("Using partitions to find the employee with the highest salary") outputmap = {}
Step 6. Define code to process the data.
For each input row that passes through the Python transformation, define code that checks if the employee's salary is higher than the maximum salary of the previous rows that have been processed. If the employee's salary is higher, update the employee who has the maximum salary in the department.
Add the following code on the On Input tab:
DepartmentID_out = DepartmentID print("Processing rows for department ID " + str(DepartmentID_out)) outputmap.setdefault(DepartmentID, None) updateMax = False if outputmap.get(DepartmentID, None) is None: updateMax = True else: max_salary = outputmap[DepartmentID]['SalaryIndex'] if max_salary is None: updateMax = True if SalaryIndex > max_salary: updateMax = True if updateMax == True: employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName, 'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName} outputmap[DepartmentID] = employee_data
Step 7. Write the data to the output ports.
On the At End tab, use the data in the map variable
outputmap
to generate a row for the employee that has the highest salary in each department.
Add the following code on the At End tab:
for x in outputmap: DepartmentID_out = x smap = outputmap[x] SalaryIndex_out = smap["SalaryIndex"] EmployeeName_out = smap["EmployeeName"] DepartmentName_out = smap["DepartmentName"] EmployeeSince_out = smap["EmployeeSince"] ## Generate the output row generateRow()
Step 8. Run the mapping.
If the output ports in the Python transformation are linked directly to a Write transformation, the target contains the following data after you run the mapping:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019


Updated September 28, 2020