Table of Contents

Search

  1. Preface
  2. Transformations
  3. Source transformation
  4. Target transformation
  5. Aggregator transformation
  6. Cleanse transformation
  7. Data Masking transformation
  8. Deduplicate transformation
  9. Expression transformation
  10. Filter transformation
  11. Hierarchy Builder transformation
  12. Hierarchy Parser transformation
  13. Hierarchy Processor transformation
  14. Input transformation
  15. Java transformation
  16. Java transformation API reference
  17. Joiner transformation
  18. Labeler transformation
  19. Lookup transformation
  20. Mapplet transformation
  21. Normalizer transformation
  22. Output transformation
  23. Parse transformation
  24. Python transformation
  25. Rank transformation
  26. Router transformation
  27. Rule Specification transformation
  28. Sequence Generator transformation
  29. Sorter transformation
  30. SQL transformation
  31. Structure Parser transformation
  32. Transaction Control transformation
  33. Union transformation
  34. Velocity transformation
  35. Verifier transformation
  36. Web Services transformation

Transformations

Transformations

Example: Use partitions to find the highest salary

Example: Use partitions to find the highest salary

You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. The project is part of the wellness program at your organization. You want to use the information to better personalize the wellness program.
You can use the Python transformation to determine which employee earns the highest salary in their department.
The following table shows the data that your organization might collect:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
HR
1
Jane Smith
500
2/16/2010
R&D
2
Ellioth Consar
150
3/29/2018
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019
HR
1
Chelsea Blanch
389
9/3/2018
R&D
1
Samuel Coin
10
1/26/2005
To use the Python transformation to determine which employee earns the highest salary in their department, perform the following tasks:
Step 1. Add a Python transformation to the mapping.
Create a Python transformation. On the
Advanced
tab, set the behavior to Active.
Step 2. Pass data to the Python transformation.
Pass the following fields from upstream transformations in the mapping to the Python transformation:
  • DepartmentName
  • DepartmentID
  • EmployeeName
  • SalaryIndex
  • EmployeeSince
Step 3. Partition the data by department.
Partition the data by department to track the highest salary within each department. To partition the data by department, add the incoming field
DepartmentID
as a partition key on the
Partition Keys
tab.
Step 4. Create output fields.
Create the following output fields on the
Output Fields
tab to pass data to downstream transformations:
  • DepartmentName_out
  • DepartmentID_out
  • EmployeeName_out
  • SalaryIndex_out
  • EmployeeSince_out
Step 5. Initialize a map.
Declare a map variable
outputmap
to associate each department ID with the employee in the department who has the highest salary.
Add the following code in the
Pre-Partition Python Code
section:
print("Using partitions to find the employee with the highest salary") outputmap = {}
Step 6. Define code to process the data.
For each input row that passes through the Python transformation, define code that checks if the salary of the employee is higher than the maximum salary of the previous rows that have been processed. If the salary of the employee is higher, update the employee who has the maximum salary in the department.
Add the following code in the
Main Python Code
section:
DepartmentID_out = DepartmentID print("Processing rows for department ID " + str(DepartmentID_out)) outputmap.setdefault(DepartmentID, None) updateMax = False if outputmap.get(DepartmentID, None) is None: updateMax = True else: max_salary = outputmap[DepartmentID]['SalaryIndex'] if max_salary is None: updateMax = True if SalaryIndex > max_salary: updateMax = True if updateMax == True: employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName, 'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName} outputmap[DepartmentID] = employee_data
Step 7. Write the data to the output files.
In the
Post-Partition Python Code
section of the
Python
tab, use the data in the map variable
outputmap
to generate a row for the employee that has the highest salary in each department.
Add the following code in the
Post-Partition Python Code
section:
for x in outputmap: DepartmentID_out = x smap = outputmap[x] SalaryIndex_out = smap["SalaryIndex"] EmployeeName_out = smap["EmployeeName"] DepartmentName_out = smap["DepartmentName"] EmployeeSince_out = smap["EmployeeSince"] ## Generate the output row generateRow()
Step 8. Run the mapping.
If the output fields in the Python transformation are linked directly to a Target transformation, the target contains the following data after you run the mapping:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
Finance
3
Concor Valashe
230
11/22/2007
Marketing
4
Manchini Voliore
800
5/17/2009
HR
1
Blaze Concave
501
8/25/2016
R&D
2
Janet Encarr
890
1/26/2019