Preface
Transformations
- Active and passive transformations
- Transformation types
- Licensed transformations
- Incoming fields
  - Field name conflicts
    - Creating a field name conflict resolution
  - Field rules
- Data object preview
- Variable fields
- Transformation caches
- Expression macros
- File lists
- Configuration for multibyte hierarchical data
Source transformation
- Source object
- File sources
- Database sources
- Web service sources
- Partitions
  - Partitioning rules and guidelines
  - Partitioning examples
- Reading hierarchical data in advanced mode
- Reading documents in advanced mode
- Configuration for multibyte hierarchical data
- Source fields
  - Editing native data types in complex file sources
  - Editing transformation data types
Target transformation
- Target object
  - Target file creation on advanced clusters
- File targets
- Database targets
- Web service targets
  - Web service operations for targets
  - Field mapping for web service targets
- Partitions
- Writing hierarchical data in advanced mode
- Configuration for multibyte hierarchical data
- Target fields
- Target transformation field mappings
- Configuring a Target transformation
Access Policy transformation
- Using parameters in Access Policy transformations
- Data filter policy best practices
- Access Policy transformation configuration
- Access Policy transformation example
Aggregator transformation
- Group by fields
- Sorted data
- Aggregate fields
- Advanced properties
- Hierarchical data in advanced mode
- Aggregator transformation example
B2B transformation
- B2B Incoming Fields
- B2B settings
- Output fields
- Field mapping
- Advanced settings
Chunking transformation
- Chunking methods
- Chunking output fields
Cleanse transformation
- Cleanse transformation configuration
  - Cleanse asset considerations
  - Synchronizing data quality assets
- Cleanse transformation field mappings
- Cleanse transformation output fields
- Advanced properties
Data Masking transformation
- Masking techniques
- Configuration properties for masking techniques
- Credit card masking
- Email masking
  - Advanced email masking
- IP address masking
- Key masking
- Phone number masking
- Random masking
- Social Insurance number masking
- Social Security number masking
- Custom substitution masking
- Dependent masking
  - Dependent masking parameters
- Substitution masking
- URL address masking
- Mask rule parameter
- Mask rule parameter example
  - Create a mapping with parameters
  - Run the mapping
- Creating a Data Masking transformation
- Consistent masked output
  - Rules and guidelines
  - Example
- Data Masking transformation example
Data Services transformation
- Dynamic service name
- Status tracing messages
- Data Services properties
- Data Services transformation input fields
- Data Services transformation output fields
- Data Services transformation field mapping
Deduplicate transformation
- Deduplication and consolidation operations
- Identity population data
- Groups in duplicate analysis
  - Example: Selecting a group key column
- Deduplicate transformation configuration
- Deduplicate transformation field mappings
- Metadata fields on the Deduplicate transformation
- Link scores and driver scores
- Deduplicate transformation output fields
- Advanced properties
Expression transformation
- Expression fields
- Expression editor
- Transformation language components for expressions
- Expression syntax
- String and numeric literals
- Adding comments to expressions
- Reserved words
- Window functions
  - Frame
  - Partition and order keys
- Example: Use a window to calculate expiration dates
- Example: Use a window to flag GPS pings
- Example: Run an aggregate function on a window
- Advanced properties
- Hierarchical data in advanced mode
Filter transformation
- Filter conditions
- Advanced properties
- Hierarchical data in advanced mode
Hierarchy Builder transformation
- Configure output settings
- Join and map fields for data conversion
  - Joining incoming data
  - Mapping relational fields to hierarchy fields
- Configure advanced properties
- Configuration for multibyte hierarchical data
- Hierarchy Builder transformation example
Hierarchy Parser transformation
- Using a Hierarchy Parser transformation
- Hierarchy Parser rules and guidelines
- Choosing a sample or schema file
- Hierarchical schemas
  - Rules and guidelines for hierarchical schemas
  - Creating a hierarchical schema
- Input settings
  - Selecting a hierarchical schema
  - Creating a hierarchical schema from sample
- Input field selection
- Field mapping
  - Selecting the elements to convert
- Output fields
- Selecting an output group
- Configuration for multibyte hierarchical data
- Hierarchy Parser transformation example
Hierarchy Processor transformation
- Hierarchy Processor transformation overview
- Processing relational output
- Processing hierarchical output
- Processing flattened output
Input transformation
- Input fields
Java transformation
- Defining a Java transformation
- Classpath configuration
- Java transformation fields
- Configuring Java transformation properties
- Developing the Java code
- Compiling the code
  - Viewing the full class code
- Troubleshooting a Java transformation
  - Finding the source of compilation errors
  - Identifying the error type
- Java transformation example
Java transformation API reference
- failSession
- generateRow
- getInRowType
- incrementErrorCount
- invokeJExpression
- isNull
- logError
- logInfo
- setNull
- setOutRowType
Joiner transformation
- Join condition
- Join type
- Advanced properties
- Hierarchical data in advanced mode
- Creating a Joiner transformation
- Joiner transformation example
Labeler transformation
- Labeler transformation configuration
- Labeler transformation field mappings
- Labeler transformation output fields
Lookup transformation
- Lookup object
  - Lookup object properties
    - Multiple match policy restrictions
  - Custom queries
- Lookup condition
- Lookup return fields
- Advanced properties
- Lookup SQL overrides
- Lookup source filter
- Dynamic lookup cache
- Persistent lookup cache
  - Rebuilding the lookup cache
- Unconnected lookups
  - Configuring an unconnected Lookup transformation
  - Calling an unconnected lookup from another transformation
- Connected Lookup example
- Dynamic Lookup example
- Unconnected Lookup example
Machine Learning transformation
- Deploying the model as a REST endpoint
- Accessing the machine learning model
- Mapping fields to the request schema
  - Mapping hierarchical fields
  - Request mapping options
- Viewing response fields
- Configuring bulk requests
  - Bulk request options
- Configuring an API proxy
- Troubleshooting
- Error handling
- Machine Learning transformation example
Mapplet transformation
- Mapplet transformation configuration
- Selecting a mapplet
- Mapplet transformation field mappings
- Mapplet parameters
- Mapplet transformation output fields
- Mapplet transformation names
- Synchronizing a mapplet
Normalizer transformation
- Normalized fields
- Normalizer field mapping
  - Normalizer field mapping options
- Advanced properties
- Target configuration for Normalizer transformations
- Normalizer field rule for parameterized sources
- Mapping example with a Normalizer and Aggregator
Output transformation
- Output fields
  - Generating output fields based on incoming fields
- Field mapping
Parse transformation
- Parse transformation configuration
- Parse transformation field mappings
- Parse transformation output fields
- Advanced properties
Python transformation
- Install and configure Python
- Python transformation fields
- Active and passive Python transformations
- Resource files
- Developing the Python code
  - Creating Python code snippets
  - Referencing a resource file
- Example: Add an ID column to nonpartitioned data
- Example: Use partitions to find the highest salary
- Example: Operationalize a pre-trained model
Rank transformation
- Ranking string values
- Rank caches
- Defining a Rank transformation
- Rank transformation fields
- Defining rank properties
- Defining rank groups
- Advanced properties
- Hierarchical data in advanced mode
- Rank transformation example
Router transformation
- Working with groups
  - Guidelines for connecting output groups
- Group filter conditions
  - Configuring a group filter condition
- Advanced properties
- Hierarchical data in advanced mode
- Router transformation examples
Rule Specification transformation
- Rule Specification transformation configuration
- Rule Specification transformation field mappings
- Rule Specification transformation output fields
- Advanced properties
Sequence transformation
- Sequence transformation uses
- Sequence output fields
- Sequence properties
  - Disabling incoming fields
- Hierarchical data in advanced mode
- Sequence transformation rules and guidelines
- Sequence transformation example
Sorter transformation
- Sort conditions
- Sorter caches
- Advanced properties
- Hierarchical data in advanced mode
- Sorter transformation example
SQL transformation
- Stored procedure or function processing
- Connected or unconnected SQL transformation for stored procedure processing
- Unconnected SQL transformations
- Query processing
- SQL transformation configuration
Structure Parser transformation
- Processing input from a Hadoop Files source
- Processing input from a flat file source
  - Configuring the flat file source
  - Configuring the Structure Parser transformation to access flat files
- Structure Parser field mapping
- Output fields
- Advanced properties
- Structure Parser transformation configuration
- Rules and guidelines for the Structure Parser transformation
- Structure Parser transformation example
Transaction Control transformation
- Transaction control condition
- Using Transaction Control transformations in mappings
  - Sample transaction control mappings with multiple targets
- Guidelines for using Transaction Control transformations in mappings
- Advanced properties
Union transformation
- Comparison to Joiner transformation
- Planning to use a Union transformation
- Input groups
- Output fields
- Field mappings
- Advanced properties
- Union Transformation example
Vector Embedding transformation
- Vector embedding techniques
- Vector embedding output fields
Velocity transformation
- Velocity transformation input format
  - Source configuration for file sources
- Velocity template
- Testing the template
- Velocity transformation output
  - Target configuration for file targets
- Velocity transformation parsers
- Examples
  - XML conversion example
  - JSON conversion example
Verifier transformation
- Address Reference Data
- Verifier transformation configuration
- Verifier transformation field mappings
  - Understanding input and output mappings
- Verifier transformation output fields
- Advanced properties
Web Services transformation
- Create a Web Services consumer connection
- Define a business service
- Configure the Web Services transformation
- Web Services transformation example
- Configuration for multibyte hierarchical data

Transformations

Back Next

Metadata fields on the Deduplicate transformation

The Deduplicate transformation includes a set of predefined fields that contain metadata for the deduplication and consolidation processes. The transformation creates the fields by default and populates the fields when the mapping runs.

Metadata fields on the Field Mapping tab

The

Target Fields

list in the Field Mappings tab includes the following metadata fields:

GroupKey: Contains the data values that the transformation uses to sort input records into groups for duplicate analysis.
SequenceId: Contains a unique identifier for each record that enters the transformation.; The transformation uses the sequence ID values to identify records in the Out_DriverId and Out_LinkId data. If you do not map the SequenceId field, the transformation uses the values on the OutRowId field as unique identifiers for the records.

Metadata fields on the Output Fields tab

The Output Fields tab includes the following metadata fields:

Out_ClusterId: Contains the identifiers of the cluster to which each record belongs.; In the deduplication process, a cluster is a set of records whose data values match each other to a degree that exceeds the duplicate threshold. Records in the same set are likely to identify the same identity. A set may contain a single record, as every unique record is a perfect match with itself.
Out_ClusterSize: Contains the number of records in the set to which the current record belongs. When a set contains a unique record, the cluster size is 1.
Out_DriverId: Contains the identifier of the driver record in each matching record set. The driver record is the record in the set with the lowest value on the SequenceId input field. If the transformation does not use the SequenceId field, the driver record is the record in the matching set with the lowest Out_RowId value.
Out_DriverScore: Contains the score that represents the degree of similarity between the current record and the driver record in the matching record set.
Out_IsSurvivor: Contains an identifier for the preferred record that a consolidation process specifies.
Out_LinkId: Contains the identifier of the record that matched with the current record and linked it to the matching record set.
Out_LinkScore: Contains the score between two records that results in the addition of a record to a matching record set. The Out_LinkId field identifies the record with which the current record shares the link score.
Out_RowId: Contains a unique identifier for each record in the mapping source data set.; The transformation uses the Out_RowId values to identify records if you do not map a field of unique identifiers to the SequenceId field.

Selecting metadata fields

The metadata fields can provide important information about the relationship between duplicate records. For example, the metadata includes the Out_LinkScore field, which represents the degree of similarity between two records as a numerical value. If you select the Out_LinkScore field, select the Out_LinkId field also. The Out_LinkId field identifies the other record in the pair of records that the Out_LinkScore value describes.

The Out_DriverId value provides a benchmark for all records in a matching record set. The Out_DriverId value is the score between the current record and the record in the set with the lowest sequence ID or row ID value. The record with the lowest ID is also the first record that the deduplication process added to the set.

Deduplicate transformation

Download Guide

Watch

Comments

Cloud Data Integration Homepage