Preface
Transformations
- Active and passive transformations
- Transformation types
- Licensed transformations
- Incoming fields
  - Field name conflicts
    - Creating a field name conflict resolution
  - Field rules
- Data object preview
- Variable fields
- Transformation caches
- Expression editor
- Expression macros
- Generate an expression
  - Prompts to generate expressions
- File lists
- Configuration for multibyte hierarchical data
Source transformation
- Source object
- File sources
- Database sources
- Web service sources
- Partitions
  - Partitioning rules and guidelines
  - Partitioning examples
- Reading hierarchical data in advanced mode
- Reading documents in advanced mode
- Configuration for multibyte hierarchical data
- Source fields
  - Editing native data types in complex file sources
  - Editing transformation data types
Target transformation
- Target object
  - Target file creation on advanced clusters
- File targets
- Database targets
- Web service targets
  - Web service operations for targets
  - Field mapping for web service targets
- Partitions
- Writing hierarchical data in advanced mode
- Configuration for multibyte hierarchical data
- Target fields
- Target transformation field mappings
- Configuring a Target transformation
Access Policy transformation
- Data access policies overview
- Data access policy best practices
- Access Policy transformation configuration
- Using parameters in Access Policy transformations
- Access Policy transformation example
- Unmasking protected data
Aggregator transformation
- Group by fields
- Sorted data
- Aggregate fields
- Advanced properties
- Hierarchical data in advanced mode
- Aggregator transformation example
B2B transformation
- B2B Incoming Fields
- B2B settings
- Output fields
- Field mapping
- Advanced settings
Chunking transformation
- Chunking methods
- Text processing functions
- Output fields
Cleanse transformation
- Cleanse transformation configuration
  - Cleanse asset considerations
  - Synchronizing data quality assets
- Cleanse transformation field mappings
- Cleanse transformation output fields
- Advanced properties
Data Masking transformation
- Masking techniques
- Configuration properties for masking techniques
- Credit card masking
- Email masking
  - Advanced email masking
- IP address masking
- Key masking
- Phone number masking
- Random masking
- Social Insurance number masking
- Social Security number masking
- Custom substitution masking
- Dependent masking
  - Dependent masking parameters
- Substitution masking
- URL address masking
- Mask rule parameter
- Mask rule parameter example
  - Create a mapping with parameters
  - Run the mapping
- Creating a Data Masking transformation
- Consistent masked output
  - Rules and guidelines
  - Example
- Data Masking transformation example
Data Services transformation
- Dynamic service name
- Status tracing messages
- Data Services properties
- Data Services transformation input fields
- Data Services transformation output fields
- Data Services transformation field mapping
Deduplicate transformation
- Deduplication and consolidation operations
- Identity population data
- Groups in duplicate analysis
  - Example: Selecting a group key column
- Deduplicate transformation configuration
- Deduplicate transformation field mappings
- Metadata fields on the Deduplicate transformation
- Link scores and driver scores
- Deduplicate transformation output fields
- Advanced properties
Expression transformation
- Expression fields
- Window functions
- Advanced properties
- Hierarchical data in advanced mode
Filter transformation
- Filter conditions
- Advanced properties
- Hierarchical data in advanced mode
Hierarchy Builder transformation
- Configure output settings
- Join and map fields for data conversion
  - Joining incoming data
  - Mapping relational fields to hierarchy fields
- Configure advanced properties
- Configuration for multibyte hierarchical data
- Hierarchy Builder transformation example
Hierarchy Parser transformation
- Using a Hierarchy Parser transformation
- Hierarchy Parser rules and guidelines
- Choosing a sample or schema file
- Hierarchical schemas
  - Rules and guidelines for hierarchical schemas
  - Creating a hierarchical schema
- Input settings
  - Selecting a hierarchical schema
  - Creating a hierarchical schema from sample
- Input field selection
- Field mapping
  - Selecting the elements to convert
- Output fields
- Selecting an output group
- Configuration for multibyte hierarchical data
- Hierarchy Parser transformation example
Hierarchy Processor transformation
- Hierarchy Processor transformation overview
- Processing relational output
- Processing hierarchical output
- Processing flattened output
Input transformation
- Input fields
Java transformation
- Defining a Java transformation
- Classpath configuration
- Java transformation fields
- Configuring Java transformation properties
- Developing the Java code
- Compiling the code
  - Viewing the full class code
- Troubleshooting a Java transformation
  - Finding the source of compilation errors
  - Identifying the error type
- Java transformation example
Java transformation API reference
- failSession
- generateRow
- getInRowType
- incrementErrorCount
- invokeJExpression
- isNull
- logError
- logInfo
- setNull
- setOutRowType
Joiner transformation
- Join condition
- Join type
- Advanced properties
- Hierarchical data in advanced mode
- Creating a Joiner transformation
- Joiner transformation example
Labeler transformation
- Labeler transformation configuration
- Labeler transformation field mappings
- Labeler transformation output fields
Lookup transformation
- Lookup object
  - Lookup object properties
    - Multiple match policy restrictions
  - Custom queries
- Lookup condition
- Lookup return fields
- Advanced properties
- Lookup SQL overrides
- Lookup source filter
- Dynamic lookup cache
- Persistent lookup cache
  - Rebuilding the lookup cache
- Unconnected lookups
  - Configuring an unconnected Lookup transformation
  - Calling an unconnected lookup from another transformation
- Connected Lookup example
- Dynamic Lookup example
- Unconnected Lookup example
Machine Learning transformation
- Deploying the model as a REST endpoint
- Accessing the machine learning model
- Mapping fields to the request schema
  - Mapping hierarchical fields
  - Request mapping options
- Viewing response fields
- Configuring bulk requests
  - Bulk request options
- Configuring an API proxy
- Troubleshooting
- Error handling
- Machine Learning transformation example
Mapplet transformation
- Mapplet transformation configuration
- Selecting a mapplet
- Mapplet transformation field mappings
- Mapplet parameters
- Mapplet transformation output fields
- Mapplet transformation names
- Synchronizing a mapplet
Normalizer transformation
- Normalized fields
- Normalizer field mapping
  - Normalizer field mapping options
- Advanced properties
- Target configuration for Normalizer transformations
- Normalizer field rule for parameterized sources
- Mapping example with a Normalizer and Aggregator
Output transformation
- Output fields
  - Generating output fields based on incoming fields
- Field mapping
Parse transformation
- Parse transformation configuration
- Parse transformation field mappings
- Parse transformation output fields
- Advanced properties
Python transformation
- Install and configure Python
- Python transformation fields
- Active and passive Python transformations
- Resource files
- Developing the Python code
  - Creating Python code snippets
  - Referencing a resource file
- Example: Add an ID column to nonpartitioned data
- Example: Use partitions to find the highest salary
- Example: Operationalize a pre-trained model
Rank transformation
- Ranking string values
- Rank caches
- Defining a Rank transformation
- Rank transformation fields
- Defining rank properties
- Defining rank groups
- Advanced properties
- Hierarchical data in advanced mode
- Rank transformation example
Router transformation
- Working with groups
  - Guidelines for connecting output groups
- Group filter conditions
  - Configuring a group filter condition
- Advanced properties
- Hierarchical data in advanced mode
- Router transformation examples
Rule Specification transformation
- Rule Specification transformation configuration
- Rule Specification transformation field mappings
- Rule Specification transformation output fields
- Advanced properties
Sequence transformation
- Sequence transformation uses
- Sequence output fields
- Sequence properties
  - Disabling incoming fields
- Hierarchical data in advanced mode
- Sequence transformation rules and guidelines
- Sequence transformation example
Sorter transformation
- Sort conditions
- Sorter caches
- Advanced properties
- Hierarchical data in advanced mode
- Sorter transformation example
SQL transformation
- Stored procedure or function processing
- Connected or unconnected SQL transformation for stored procedure processing
- Unconnected SQL transformations
- Query processing
- SQL transformation configuration
Structure Parser transformation
- Processing input from a Hadoop Files source
- Processing input from a flat file source
  - Configuring the flat file source
  - Configuring the Structure Parser transformation to access flat files
- Structure Parser field mapping
- Output fields
- Advanced properties
- Structure Parser transformation configuration
- Rules and guidelines for the Structure Parser transformation
- Structure Parser transformation example
Transaction Control transformation
- Transaction control condition
- Using Transaction Control transformations in mappings
  - Sample transaction control mappings with multiple targets
- Guidelines for using Transaction Control transformations in mappings
- Advanced properties
Union transformation
- Comparison to Joiner transformation
- Planning to use a Union transformation
- Input groups
- Output fields
- Field mappings
- Advanced properties
- Union Transformation example
Vector Embedding transformation
- Vector embedding models
- Built-in vector embedding techniques
- Vector embedding output fields
Velocity transformation
- Velocity transformation input format
  - Source configuration for file sources
- Velocity template
- Testing the template
- Velocity transformation output
  - Target configuration for file targets
- Velocity transformation parsers
- Examples
  - XML conversion example
  - JSON conversion example
Verifier transformation
- Address Reference Data
- Verifier transformation configuration
- Verifier transformation field mappings
  - Understanding input and output mappings
- Verifier transformation output fields
- Advanced properties
Web Services transformation
- Create a Web Services consumer connection
- Define a business service
- Configure the Web Services transformation
- Web Services transformation example
- Configuration for multibyte hierarchical data

Transformations

Back Next

Example: Use partitions to find the highest salary

You are an HR staff member at your organization. You are working on a project to model how employee salaries are associated with aspects of life that employees find important. The project is part of the wellness program at your organization. You want to use the information to better personalize the wellness program.

You can use the Python transformation to determine which employee earns the highest salary in their department.

The following table shows the data that your organization might collect:

DepartmentName	DepartmentID	EmployeeName	SalaryIndex	EmployeeSince
HR	1	Jane Smith	500	2/16/2010
R&D	2	Ellioth Consar	150	3/29/2018
Finance	3	Concor Valashe	230	11/22/2007
Marketing	4	Manchini Voliore	800	5/17/2009
HR	1	Blaze Concave	501	8/25/2016
R&D	2	Janet Encarr	890	1/26/2019
HR	1	Chelsea Blanch	389	9/3/2018
R&D	1	Samuel Coin	10	1/26/2005

To use the Python transformation to determine which employee earns the highest salary in their department, perform the following tasks:

Step 1. Add a Python transformation to the mapping.: Create a Python transformation. On the
Advanced
tab, set the behavior to Active.
Step 2. Pass data to the Python transformation.: Pass the following fields from upstream transformations in the mapping to the Python transformation:
DepartmentName
DepartmentID
EmployeeName
SalaryIndex
EmployeeSince
Step 3. Partition the data by department.: Partition the data by department to track the highest salary within each department. To partition the data by department, add the incoming field
DepartmentID
as a partition key on the
Partition Keys
tab.
Step 4. Create output fields.: Create the following output fields on the
Output Fields
tab to pass data to downstream transformations:
DepartmentName_out
DepartmentID_out
EmployeeName_out
SalaryIndex_out
EmployeeSince_out
Step 5. Initialize a map.: Declare a map variable
outputmap
to associate each department ID with the employee in the department who has the highest salary.

Add the following code in the
Pre-Partition Python Code
section:

print("Using partitions to find the employee with the highest salary") outputmap = {}
Step 6. Define code to process the data.: For each input row that passes through the Python transformation, define code that checks if the salary of the employee is higher than the maximum salary of the previous rows that have been processed. If the salary of the employee is higher, update the employee who has the maximum salary in the department.

Add the following code in the
Main Python Code
section:

DepartmentID_out = DepartmentID print("Processing rows for department ID " + str(DepartmentID_out)) outputmap.setdefault(DepartmentID, None) updateMax = False if outputmap.get(DepartmentID, None) is None: updateMax = True else: max_salary = outputmap[DepartmentID]['SalaryIndex'] if max_salary is None: updateMax = True if SalaryIndex > max_salary: updateMax = True if updateMax == True: employee_data = {'SalaryIndex':SalaryIndex,'EmployeeName':EmployeeName, 'EmployeeSince':EmployeeSince,'DepartmentName':DepartmentName} outputmap[DepartmentID] = employee_data
Step 7. Write the data to the output files.: In the
Post-Partition Python Code
section of the
Python
tab, use the data in the map variable
outputmap
to generate a row for the employee that has the highest salary in each department.

Add the following code in the
Post-Partition Python Code
section:

for x in outputmap: DepartmentID_out = x smap = outputmap[x] SalaryIndex_out = smap["SalaryIndex"] EmployeeName_out = smap["EmployeeName"] DepartmentName_out = smap["DepartmentName"] EmployeeSince_out = smap["EmployeeSince"] ## Generate the output row generateRow()
Step 8. Run the mapping.: If the output fields in the Python transformation are linked directly to a Target transformation, the target contains the following data after you run the mapping:

DepartmentName

DepartmentID

EmployeeName

SalaryIndex

EmployeeSince

Finance

3

Concor Valashe

230

11/22/2007

Marketing

4

Manchini Voliore

800

5/17/2009

HR

1

Blaze Concave

501

8/25/2016

R&D

2

Janet Encarr

890

1/26/2019