Table of Contents

Search

  1. Preface
  2. Transformations
  3. Source transformation
  4. Target transformation
  5. Access Policy transformation
  6. Aggregator transformation
  7. B2B transformation
  8. Cleanse transformation
  9. Data Masking transformation
  10. Data Services transformation
  11. Deduplicate transformation
  12. Expression transformation
  13. Filter transformation
  14. Hierarchy Builder transformation
  15. Hierarchy Parser transformation
  16. Hierarchy Processor transformation
  17. Input transformation
  18. Java transformation
  19. Java transformation API reference
  20. Joiner transformation
  21. Labeler transformation
  22. Lookup transformation
  23. Machine Learning transformation
  24. Mapplet transformation
  25. Normalizer transformation
  26. Output transformation
  27. Parse transformation
  28. Python transformation
  29. Rank transformation
  30. Router transformation
  31. Rule Specification transformation
  32. Sequence transformation
  33. Sorter transformation
  34. SQL transformation
  35. Structure Parser transformation
  36. Transaction Control transformation
  37. Union transformation
  38. Velocity transformation
  39. Verifier transformation
  40. Web Services transformation

Transformations

Transformations

Example: Use partitions to find the highest salary

Example: Use partitions to find the highest salary

You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. The project is part of the wellness program at your organization. You want to use the information to better personalize the wellness program.
You can use the Python transformation to determine which employee earns the highest salary in their department.
The following table shows the data that your organization might collect:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
HR
1
Jane Smith
500
2/16/2010
R&D
2
Ellioth Consar
150
3/29/2018
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019
HR
1
Chelsea Blanch
389
9/3/2018
R&D
1
Samuel Coin
10
1/26/2005
To use the Python transformation to determine which employee earns the highest salary in their department, perform the following tasks:
Step 1. Add a Python transformation to the mapping.
Create a Python transformation. On the
Advanced
tab, set the behavior to Active.
Step 2. Pass data to the Python transformation.
Pass the following fields from upstream transformations in the mapping to the Python transformation:
  • DepartmentName
  • DepartmentID
  • EmployeeName
  • SalaryIndex
  • EmployeeSince
Step 3. Partition the data by department.
Partition the data by department to track the highest salary within each department. To partition the data by department, add the incoming field
DepartmentID
as a partition key on the
Partition Keys
tab.
Step 4. Create output fields.
Create the following output fields on the
Output Fields
tab to pass data to downstream transformations:
  • DepartmentName_out
  • DepartmentID_out
  • EmployeeName_out
  • SalaryIndex_out
  • EmployeeSince_out
Step 5. Initialize a map.
Declare a map variable
outputmap
to associate each department ID with the employee in the department who has the highest salary.
Add the following code in the
Pre-Partition Python Code
section:
print("Using partitions to find the employee with the highest salary") outputmap = {}
Step 6. Define code to process the data.
For each input row that passes through the Python transformation, define code that checks if the salary of the employee is higher than the maximum salary of the previous rows that have been processed. If the salary of the employee is higher, update the employee who has the maximum salary in the department.
Add the following code in the
Main Python Code
section:
DepartmentID_out = DepartmentID print("Processing rows for department ID " + str(DepartmentID_out)) outputmap.setdefault(DepartmentID, None) updateMax = False if outputmap.get(DepartmentID, None) is None: updateMax = True else: max_salary = outputmap[DepartmentID]['SalaryIndex'] if max_salary is None: updateMax = True if SalaryIndex > max_salary: updateMax = True if updateMax == True: employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName, 'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName} outputmap[DepartmentID] = employee_data
Step 7. Write the data to the output files.
In the
Post-Partition Python Code
section of the
Python
tab, use the data in the map variable
outputmap
to generate a row for the employee that has the highest salary in each department.
Add the following code in the
Post-Partition Python Code
section:
for x in outputmap: DepartmentID_out = x smap = outputmap[x] SalaryIndex_out = smap["SalaryIndex"] EmployeeName_out = smap["EmployeeName"] DepartmentName_out = smap["DepartmentName"] EmployeeSince_out = smap["EmployeeSince"] ## Generate the output row generateRow()
Step 8. Run the mapping.
If the output fields in the Python transformation are linked directly to a Target transformation, the target contains the following data after you run the mapping:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019

0 COMMENTS

We’d like to hear from you!