Table of Contents

Search

  1. Preface
  2. Introduction to Accelerators
  3. Core Accelerator
  4. Data Domains Accelerator
  5. Australia/New Zealand Accelerator
  6. BCBS 239/CCAR Accelerator
  7. Brazil Accelerator
  8. Financial Services Accelerator
  9. France Accelerator
  10. Germany Accelerator
  11. India Accelerator
  12. Italy Accelerator
  13. Portugal Accelerator
  14. Spain Accelerator
  15. United Kingdom Accelerator
  16. U.S./Canada Accelerator

Accelerator Guide

Accelerator Guide

U.S./Canada General Data Cleansing Rules

U.S./Canada General Data Cleansing Rules

Use the general data cleansing rules to identify the type of information contained in input fields.
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the U.S./Canada accelerator:
Name
Description
rule_CAN_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and Canadian address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information.
rule_CAN_NER_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and Canadian address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information. The rule uses probabilistic matching techniques to identify the types of information.
rule_USA_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and United States address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information.
rule_Field_North_American_Data
Identifies the following types of fields: name, occupation title, company, address, city, state or province, postcode, country, personal ID, email, telephone, credit card, and date.
The rule generates a score that indicates the degree of confidence in the field identification. Higher scores indicate greater levels of confidence.
If the rule cannot assign a field type, the rule writes the data on the Out_Undetermined port.
rule_USA_NER_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and United States address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information. The rule uses probabilistic matching techniques to identify the types of information.

Dependencies on Core General Data Cleansing Rules

The U.S./Canada accelerator depends on the following general data cleansing rules from the Core accelerator:
  • rule_Assign_DQ_GeocodinStatus_Description
  • rule_Assign_DQ_Mailability_Score_Description
  • rule_Assign_DQ_Match_Code_Descriptions
  • rule_Date_Validation
  • rule_Remove_Extra_Spaces
  • rule_Remove_Punctuation
  • rule_Replace_Limited_Punct_with_Space
  • rule_UpperCase
For more information about these rules, see Core General Data Cleansing Rules.