Accelerator Guide

10.5
- 10.5.4
- 10.4.0

Back Next

Core General Data Cleansing Rules

Use the general data cleansing rules to parse, standardize, and validate data.

Find the general data cleansing rules in the following repository location:

[Informatica_DQ_Content]\Rules\General_Data_Cleansing

The following table describes the general data cleansing rules in the Core accelerator:

Name	Description
mplt_Parse_Tokens_Into_Single_Field	Parses each word in a space-delimited string to a separate port.
rule_Add_Leading_Zero	Adds the numeral "0" to the beginning of a string.
rule_Add_Parentheses_At_Start_End_ofLine	Adds parenthetical symbols at the start and end of a string.
rule_Add_Plus_To_Start_of_Line	Adds the plus symbol at the start of a string.
rule_Add_Space_Around_Ampersand	Adds a space before and after all ampersands in a string.
rule_Add_Space_Around_Hyphen	Adds a space before and after all dashes and hyphens in a string.
rule_Add_Space_Between_Number_Letter	Adds a space in between a character pair composed of one numeral and one alphabetic character. Reading from left to right, the mapplet adds a space to the first numeral-alphabetic character pair in the data.
rule_Add_Spaces_Around_Period	Adds a space before and after all periods in a string.
rule_AllTrim	Removes all leading and trailing spaces from the input data fields.
rule_Assign_DQ_AddressResolutionCode_Description	Assigns a description to the Address Resolution Code output from the Address Validator transformation.
rule_Assign_DQ_ElementInputStatus_Description	Assigns a description to the Element Input Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ElementRelevance_Description	Assigns a description to the Element Relevance output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ElementResultStatus_Description	Assigns a description to the Element Result Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_ExtendedElementStatus_Description	Assigns a description to the Extended Element Result Status output from the Address Validator transformation.
rule_Assign_DQ_GeocodingStatus_Description	Assigns a description to the Geocoding Status output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Mailability_Score_Description	Assigns a description to the Mailability Score output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Match_Code_Description	Assigns a description to the Match Code output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Classify_Language	Classifies a string as one of the following languages: Arabic, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish, or Turkish. The rule uses the Language_Classifier content set to identify the languages. The rule returns a language for every string that it analyzes. If a string belongs to a language that the rule does not recognize, the rule returns the language that most closely matches the text in the string.
rule_Compare_Dates	Calculates the difference between two dates. The mapplet uses the following units of measure: Hours Days Months Years Each output value is exclusive from the other values. The outputs cannot be added to represent the difference between the data values.
rule_Completeness	Checks a single port for NULL values. Returns "Complete" if the port contains data. Returns "Incomplete" if the port is empty or contains a NULL value.
rule_Completeness_Multi_Port	Checks multiple ports for NULL values. Returns "Complete" if all ports contain data. Returns "Incomplete" if any port is empty or contains a NULL value.
rule_Concatenate_Words	Concatenates two fields. Uses a character space as a separator.
rule_Convert_Match_Codes_to_Legacy_Values	Converts the output from the Match Code port in an Address Validator transformation to the equivalent address validation match code in Data Quality 8.6.
rule_CreditCard_Number_Validation	Validates credit card numbers for credit cards that use the Luhn algorithm. Validation includes, but is not limited to, the following credit cards: American Express Diners Club Carte Blanche Diners Club International Diners Club US & Canada Discover Card JCB Maestro Master Card Solo Switch Visa Visa Electron The rule returns "Valid" or "Invalid."
rule_Date_Complete	Verifies that the input string conforms to a date format that the rule recognizes. The rule reads the following reference data object: user_defined_dates_infa
rule_Date_of_Birth_Validation	Checks the number of years between a date of birth and the current date. Returns "Adult" or "Minor" in addition to "Valid" if the number of years 120 or lower. Returns "Invalid" if the number of years is greater than 120.
rule_Date_Parse	Parses date data from a string to a port that the rule specifies. The rule recognizes dates in the following formats: dd/mm/yyyy mm/dd/yyyy yyyy/dd/mm The rule returns a date and also returns a string that contains the input text without the date.
rule_Date_Standardization	Standardizes date strings to an output format that you specify. To set the output format, open the dq_FormatDate Expression transformation in the rule and update the Output_Date_Format expression variable and the Delimiter expression variable. If the input data does not describe a valid date, the rule returns the digit 0 for each input character.
rule_Date_Validation	Validates date strings that appear in a single format in a data column. To configure the date format that the rule uses for validation, open the dq_ValidateDate Expression transformation in the rule and update the In_Date_Format expression variable. The default format is "MM/DD/YYYY." The rule returns "Valid" or "Invalid."
rule_Date_Validation_Variable_Format	Validates date strings that appear in multiple formats in a data column. Use the rule when a data source includes the following columns: A column that contains date values in multiple formats. A column that identifies the format of the date value in each row. If the column does not identify a date format for a row, the rule applies the format "MM/DD/YYYY" to the date value. The rule reads all data values that the `is_date()` function recognizes. The rule returns "Valid" or "Invalid."
rule_Days_Between_Dates	Calculates the number of days between two dates.
rule_Days_From_Current_Date	Calculates the number of days between a specified date and the current date.
rule_EAN13_Algorithm	Validates an International Article Number. The rule returns "Valid" if the check digit is correct for the number and "Invalid" if the check digit is incorrect.
rule_GTIN_Validation	Validates a Global Trade Item Number (GTIN). The rule validates eight-dight, twelve-digit, thirteen-digit, and fourteen-digit numbers. The rule returns "Valid" if the check digit is correct for the number and "Invalid" if the check digit is incorrect.
rule_IsNumeric	Verifies that the input data is numeric. The rule returns "True" or "False."
rule_LowerCase	Returns all alphabetic characters in lower case.
rule_Luhn_Algorithm	Applies the Luhn algorithm to a numeric string. The rule can validate numeric strings, such as credit card numbers.
rule_Mask_Profanity	Checks input data for profanity. Masks profanity as "CENSORED" in the output data.
rule_Negative_Number_Validation	Validates that the input data is a negative number.
rule_Numeric_Completeness	Checks for NULL values in numeric inputs.
rule_Parse_Alpha_Chars_from_Non_Alpha_Chars	Identifies the alphabetic characters and the non-alphabetic characters in an input string and writes each set of characters to different output ports. For example, the rule parses the following values from the input string teststring_123 : testrtring _123
rule_Parse_First_Word	Parses the first word in an input string to a port that the rule specifies.
rule_Parse_Number_At_End_Of_Line	Parses any number that occurs at the end of an input string to a port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Number_At_Start_Of_Line	Parses any number that occurs at the start of an input string to a port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Profanity	Compares strings to a reference table of profane terms and parses any term that matches a reference table value to a port that the rule specifies.
rule_Parse_Text_Between_Parentheses	Parses strings that are enclosed in parentheses to a port that the rule specifies. The rule contains an output port for the parsed strings and an output port for the input text without the parsed strings.
rule_Parse_Text_in_Single_Quotes	Parses strings that are enclosed in quotation marks to a port that the rule specifies. When the input data contains multiple quoted elements, the rule parses the final element. The rule reads the input strings from left to right. The rule contains an output port for the parsed strings and an output port for the input text without the parsed strings.
rule_Past_Date_Label	Determines whether an input date is earlier than the system date or later than the system date.
rule_Personal_Company_Identification	Parses person names and company names to different ports that the rule specifies. The rule has the following outputs: Person name Company name Data category, such as person name or company name Data that the rule cannot parse
rule_Postive_Number_Validation	Verifies that the input data is a positive number.
rule_Prepend_Zero_to_Single_Digit	Prepends the numeral "0" to single numeric characters.
rule_Remove_All_Leading_Zeros	Removes all instances of the numeric character "0" from the beginning of a string.
rule_Remove_Apostrophe	Removes apostrophes. The rule merges the text strings on either side of the apostrophe.
rule_Remove_Control_Characters	Removes control characters from text strings. The rule returns a string that contains the control characters and a string that contains the input text without the control characters.
rule_Remove_Extra_Spaces	Replaces all consecutive spaces with a single space and trims leading and trailing spaces.
rule_Remove_Hyphen	Removes hyphens.
rule_Remove_Leading_Zero	Removes a single instance of the numeric character "0" from the beginning of a string.
rule_Remove_Limited_Punctuation	Removes extraneous characters. Extraneous characters include slashes, back slashes, periods, exclamation marks, underscores, and multiple consecutive spaces.
rule_Remove_Non_Numbers	Removes all characters that are not numeric.
rule_Remove_Parentheses	Removes right and left parenthesis symbols.
rule_Remove_Period	Removes periods.
rule_Remove_Period_Parentheses	Removes the following characters: Left and right parentheses Periods
rule_Remove_Punctuation	Removes punctuation symbols.
rule_Remove_Punctuation_and_Space	Removes all punctuation and all space characters.
rule_Remove_Quotation	Removes quotation marks.
rule_Remove_Slashes	Removes forward slashes and back slashes.
rule_Remove_Space	Removes all character spaces.
rule_Replace_Ampersand_With_Space	Replaces ampersands with spaces.
rule_Replace_Hyphen_Underscore_with_Space	Replaces hyphens and underscores with spaces.
rule_Replace_Hyphen_with_Space	Replaces hyphens with spaces.
rule_Replace_Limited_Punct_with_Space	Replaces the following punctuation characters with a single space: dash, back slash, period, exclamation mark, and underscore. The rule also replaces two, three, and four consecutive spaces with a single space.
rule_Replace_Non_Alphabetic_with_Space	Replaces numerals and punctuation characters with a single space.
rule_Replace_Period_With_Space	Replaces periods with a single space.
rule_Replace_Punctuation_with_Space	Replaces all punctuation with spaces.
rule_Replace_Slashes_With_Space	Replaces forward slashes and back slashes with spaces.
rule_Reverse_String_Input	Reverses the order of characters in input strings.
rule_String_Completeness	Checks a string for completeness. The rule also searches the input strings for values in the reference table string_default_values_infa. The reference table contains values such as NA, DEFAULT, and XX. If an input string contains a value in the reference table, the rule identifies the string as incomplete.
rule_TitleCase	Converts strings to title case. In title case strings, the first letter of each word is capitalized.
rule_Translate_Diacritic_Characters	Replaces diacritic characters with ASCII equivalents. For example, the rule converts "ã" to "a".
rule_UpperCase	Returns all alphabetic characters in upper case.
rule_URL_Validation	Validates the format and structure of a URL.
rule_Years_Since_Date_of_Birth	Calculates the number of years since the input date.

Rename Saved Search

Table of Contents

Accelerator Guide

Accelerator Guide

Core General Data Cleansing Rules

Core General Data Cleansing Rules