Table of Contents

Search

  1. Preface
  2. Introduction to Accelerators
  3. Core Accelerator
  4. Data Domains Accelerator
  5. Australia/New Zealand Accelerator
  6. BCBS 239/CCAR Accelerator
  7. Brazil Accelerator
  8. Financial Services Accelerator
  9. France Accelerator
  10. Germany Accelerator
  11. India Accelerator
  12. Italy Accelerator
  13. Portugal Accelerator
  14. Spain Accelerator
  15. United Kingdom Accelerator
  16. U.S./Canada Accelerator

Accelerator Guide

Accelerator Guide

Core General Data Cleansing Rules

Core General Data Cleansing Rules

Use the general data cleansing rules to parse, standardize, and validate data.
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the Core accelerator:
Name
Description
mplt_Parse_Tokens_Into_Single_Field
Parses each word in a space-delimited string to a separate port.
rule_Add_Leading_Zero
Adds the numeral "0" to the beginning of a string.
rule_Add_Parentheses_At_Start_End_ofLine
Adds parenthetical symbols at the start and end of a string.
rule_Add_Plus_To_Start_of_Line
Adds the plus symbol at the start of a string.
rule_Add_Space_Around_Ampersand
Adds a space before and after all ampersands in a string.
rule_Add_Space_Around_Hyphen
Adds a space before and after all dashes and hyphens in a string.
rule_Add_Space_Between_Number_Letter
Adds a space in between a character pair composed of one numeral and one alphabetic character. Reading from left to right, the mapplet adds a space to the first numeral-alphabetic character pair in the data.
rule_Add_Spaces_Around_Period
Adds a space before and after all periods in a string.
rule_AllTrim
Removes all leading and trailing spaces from the input data fields.
rule_Assign_DQ_AddressResolutionCode_Description
Assigns a description to the Address Resolution Code output from the Address Validator transformation.
rule_Assign_DQ_ElementInputStatus_Description
Assigns a description to the Element Input Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ElementRelevance_Description
Assigns a description to the Element Relevance output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ElementResultStatus_Description
Assigns a description to the Element Result Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ExtendedElementStatus_Description
Assigns a description to the Extended Element Result Status output from the Address Validator transformation.
rule_Assign_DQ_GeocodingStatus_Description
Assigns a description to the Geocoding Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Mailability_Score_Description
Assigns a description to the Mailability Score output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Match_Code_Description
Assigns a description to the Match Code output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Classify_Language
Classifies a string as one of the following languages: Arabic, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish, or Turkish. The rule uses the Language_Classifier content set to identify the languages.
The rule returns a language for every string that it analyzes. If a string belongs to a language that the rule does not recognize, the rule returns the language that most closely matches the text in the string.
rule_Compare_Dates
Calculates the difference between two dates. The mapplet uses the following units of measure:
  • Hours
  • Days
  • Months
  • Years
Each output value is exclusive from the other values. The outputs cannot be added to represent the difference between the data values.
rule_Completeness
Checks a single port for NULL values. Returns "Complete" if the port contains data. Returns "Incomplete" if the port is empty or contains a NULL value.
rule_Completeness_Multi_Port
Checks multiple ports for NULL values. Returns "Complete" if all ports contain data. Returns "Incomplete" if any port is empty or contains a NULL value.
rule_Concatenate_Words
Concatenates two fields. Uses a character space as a separator.
rule_Convert_Match_Codes_to_Legacy_Values
Converts the output from the Match Code port in an Address Validator transformation to the equivalent address validation match code in Data Quality 8.6.
rule_CreditCard_Number_Validation
Validates credit card numbers for credit cards that use the Luhn algorithm. Validation includes, but is not limited to, the following credit cards:
  • American Express
  • Diners Club Carte Blanche
  • Diners Club International
  • Diners Club US & Canada
  • Discover Card
  • JCB
  • Maestro
  • Master Card
  • Solo
  • Switch
  • Visa
  • Visa Electron
The rule returns "Valid" or "Invalid."
rule_Date_Complete
Verifies that the input string conforms to a date format that the rule recognizes. The rule reads the following reference data object:
  • user_defined_dates_infa
rule_Date_of_Birth_Validation
Checks the number of years between a date of birth and the current date. Returns "Adult" or "Minor" in addition to "Valid" if the number of years 120 or lower. Returns "Invalid" if the number of years is greater than 120.
rule_Date_Parse
Parses date data from a string to a port that the rule specifies. The rule recognizes dates in the following formats:
  • dd/mm/yyyy
  • mm/dd/yyyy
  • yyyy/dd/mm
The rule returns a date and also returns a string that contains the input text without the date.
rule_Date_Standardization
Standardizes date strings to an output format that you specify. To set the output format, open the dq_FormatDate Expression transformation in the rule and update the Output_Date_Format expression variable and the Delimiter expression variable. If the input data does not describe a valid date, the rule returns the digit 0 for each input character.
rule_Date_Validation
Validates date strings that appear in a single format in a data column. To configure the date format that the rule uses for validation, open the dq_ValidateDate Expression transformation in the rule and update the In_Date_Format expression variable. The default format is "MM/DD/YYYY." The rule returns "Valid" or "Invalid."
rule_Date_Validation_Variable_Format
Validates date strings that appear in multiple formats in a data column. Use the rule when a data source includes the following columns:
  • A column that contains date values in multiple formats.
  • A column that identifies the format of the date value in each row. If the column does not identify a date format for a row, the rule applies the format "MM/DD/YYYY" to the date value.
The rule reads all data values that the
is_date()
function recognizes. The rule returns "Valid" or "Invalid."
rule_Days_Between_Dates
Calculates the number of days between two dates.
rule_Days_From_Current_Date
Calculates the number of days between a specified date and the current date.
rule_EAN13_Algorithm
Validates an International Article Number. The rule returns "Valid" if the check digit is correct for the number and "Invalid" if the check digit is incorrect.
rule_GTIN_Validation
Validates a Global Trade Item Number (GTIN). The rule validates eight-dight, twelve-digit, thirteen-digit, and fourteen-digit numbers. The rule returns "Valid" if the check digit is correct for the number and "Invalid" if the check digit is incorrect.
rule_IsNumeric
Verifies that the input data is numeric. The rule returns "True" or "False."
rule_LowerCase
Returns all alphabetic characters in lower case.
rule_Luhn_Algorithm
Applies the Luhn algorithm to a numeric string. The rule can validate numeric strings, such as credit card numbers.
rule_Mask_Profanity
Checks input data for profanity. Masks profanity as "CENSORED" in the output data.
rule_Negative_Number_Validation
Validates that the input data is a negative number.
rule_Numeric_Completeness
Checks for NULL values in numeric inputs.
rule_Parse_Alpha_Chars_from_Non_Alpha_Chars
Identifies the alphabetic characters and the non-alphabetic characters in an input string and writes each set of characters to different output ports. For example, the rule parses the following values from the input string
teststring_123
:
testrtring
_123
rule_Parse_First_Word
Parses the first word in an input string to a port that the rule specifies.
rule_Parse_Number_At_End_Of_Line
Parses any number that occurs at the end of an input string to a port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Number_At_Start_Of_Line
Parses any number that occurs at the start of an input string to a port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Profanity
Compares strings to a reference table of profane terms and parses any term that matches a reference table value to a port that the rule specifies.
rule_Parse_Text_Between_Parentheses
Parses strings that are enclosed in parentheses to a port that the rule specifies. The rule contains an output port for the parsed strings and an output port for the input text without the parsed strings.
rule_Parse_Text_in_Single_Quotes
Parses strings that are enclosed in quotation marks to a port that the rule specifies. When the input data contains multiple quoted elements, the rule parses the final element. The rule reads the input strings from left to right. The rule contains an output port for the parsed strings and an output port for the input text without the parsed strings.
rule_Past_Date_Label
Determines whether an input date is earlier than the system date or later than the system date.
rule_Personal_Company_Identification
Parses person names and company names to different ports that the rule specifies. The rule has the following outputs:
  • Person name
  • Company name
  • Data category, such as person name or company name
  • Data that the rule cannot parse
rule_Postive_Number_Validation
Verifies that the input data is a positive number.
rule_Prepend_Zero_to_Single_Digit
Prepends the numeral "0" to single numeric characters.
rule_Remove_All_Leading_Zeros
Removes all instances of the numeric character "0" from the beginning of a string.
rule_Remove_Apostrophe
Removes apostrophes. The rule merges the text strings on either side of the apostrophe.
rule_Remove_Control_Characters
Removes control characters from text strings. The rule returns a string that contains the control characters and a string that contains the input text without the control characters.
rule_Remove_Extra_Spaces
Replaces all consecutive spaces with a single space and trims leading and trailing spaces.
rule_Remove_Hyphen
Removes hyphens.
rule_Remove_Leading_Zero
Removes a single instance of the numeric character "0" from the beginning of a string.
rule_Remove_Limited_Punctuation
Removes extraneous characters. Extraneous characters include slashes, back slashes, periods, exclamation marks, underscores, and multiple consecutive spaces.
rule_Remove_Non_Numbers
Removes all characters that are not numeric.
rule_Remove_Parentheses
Removes right and left parenthesis symbols.
rule_Remove_Period
Removes periods.
rule_Remove_Period_Parentheses
Removes the following characters:
  • Left and right parentheses
  • Periods
rule_Remove_Punctuation
Removes punctuation symbols.
rule_Remove_Punctuation_and_Space
Removes all punctuation and all space characters.
rule_Remove_Quotation
Removes quotation marks.
rule_Remove_Slashes
Removes forward slashes and back slashes.
rule_Remove_Space
Removes all character spaces.
rule_Replace_Ampersand_With_Space
Replaces ampersands with spaces.
rule_Replace_Hyphen_Underscore_with_Space
Replaces hyphens and underscores with spaces.
rule_Replace_Hyphen_with_Space
Replaces hyphens with spaces.
rule_Replace_Limited_Punct_with_Space
Replaces the following punctuation characters with a single space: dash, back slash, period, exclamation mark, and underscore. The rule also replaces two, three, and four consecutive spaces with a single space.
rule_Replace_Non_Alphabetic_with_Space
Replaces numerals and punctuation characters with a single space.
rule_Replace_Period_With_Space
Replaces periods with a single space.
rule_Replace_Punctuation_with_Space
Replaces all punctuation with spaces.
rule_Replace_Slashes_With_Space
Replaces forward slashes and back slashes with spaces.
rule_Reverse_String_Input
Reverses the order of characters in input strings.
rule_String_Completeness
Checks a string for completeness. The rule also searches the input strings for values in the reference table string_default_values_infa. The reference table contains values such as NA, DEFAULT, and XX. If an input string contains a value in the reference table, the rule identifies the string as incomplete.
rule_TitleCase
Converts strings to title case. In title case strings, the first letter of each word is capitalized.
rule_Translate_Diacritic_Characters
Replaces diacritic characters with ASCII equivalents. For example, the rule converts "ã" to "a".
rule_UpperCase
Returns all alphabetic characters in upper case.
rule_URL_Validation
Validates the format and structure of a URL.
rule_Years_Since_Date_of_Birth
Calculates the number of years since the input date.

0 COMMENTS

We’d like to hear from you!