Using the Data Quality Accelerator for Crisis Response

Back Next

General Data Cleansing Rules

Use the general data cleansing rules to parse, standardize, and validate data. Find the general data cleansing rules in the following repository location:

[Project_Name]\Rules\General_Data_Cleansing

The following table describes the general data cleansing rules:

Name	Description
rule_Assign_DQ_Mailability_Score_Description	Assigns a description to the Mailability Score output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Match_Code_Description	Assigns a description to the Match Code output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_CAN_NER_Field_Identification	Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and Canadian address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information. The rule uses probabilistic matching techniques to identify the types of information.
rule_Compare_Dates	Calculates the difference between two dates. The rule uses the following units of measure: - Hours - Days - Months - Years Each output value is exclusive from the other values. The outputs cannot be added to represent the difference between the data values.
rule_Completeness	Checks a single field for NULL values. Returns "Complete" if the field contains data. Returns "Incomplete" if the field is empty or contains a NULL value.
rule_Completeness_Multi_Port	Checks multiple fields for NULL values. Returns "Complete" if all fields contain data. Returns "Incomplete" if any field is empty or contains a NULL value.
rule_Date_Complete	Verifies that the input string conforms to a date format that the rule recognizes. The rule reads the following reference table: user_defined_dates_infa
rule_Date_of_Birth_Validation	Checks the number of years between a date of birth and the current date. Returns "Adult" or "Minor" in addition to "Valid" if the number of years is 120 or lower. Returns "Invalid" if the number of years is greater than 120.
rule_Date_Parse	Parses date data from a string to a field that the rule specifies. The rule recognizes dates in the following formats: - dd/mm/yyyy - mm/dd/yyyy - yyyy/dd/mm The rule returns a date and also returns a string that contains the input text without the date.
rule_Date_Standardization	Standardizes date strings to an output format that you specify. To set the output format, open the dq_FormatDate Expression transformation in the rule and update the Output_Date_Format expression variable and the Delimiter expression variable. If the input data does not describe a valid date, the rule returns the digit 0 for each input character.
rule_Date_Validation	Validates date strings that appear in a single format in a data column. To configure the date format that the rule uses for validation, open the dq_ValidateDate Expression transformation in the rule and update the In_Date_Format expression variable. The default format is "MM/DD/YYYY." The rule returns "Valid" or "Invalid."
rule_Date_Validation_Variable_Format	Validates date strings that appear in multiple formats in a data column. Use the rule when a data source includes the following columns: - A column that contains date values in multiple formats. - A column that identifies the format of the date value in each row. If the column does not identify a date format for a row, the rule applies the format "MM/DD/YYYY" to the date value. The rule reads all data values that the is_date() function recognizes. The rule returns "Valid" or "Invalid."
rule_Days_Between_Dates	Calculates the number of days between two dates.
rule_Days_From_Current_Date	Calculates the number of days between a specified date and the current date.
rule_Field_North_American_Data	Identifies the following types of fields: name, occupation title, company, address, city, state or province, postcode, country, personal ID, email, telephone, credit card, and date. The rule generates a score that indicates the degree of confidence in the field identification. Higher scores indicate greater levels of confidence. If the rule cannot assign a field type, the rule writes the data on the Out_Undetermined field.
rule_IsNumeric	Verifies that the input data is numeric. The rule returns "True" or "False."
rule_LowerCase	Returns all alphabetic characters in lower case.
rule_Negative_Number_Validation	Validates that the input data is a negative number.
rule_Numeric_Completeness	Checks for NULL values in numeric inputs.
rule_Parse_Alpha_Chars_from_Non_Alpha_Chars	Identifies the alphabetic characters and the non-alphabetic characters in an input string and writes each set of characters to different output fields. For example, the rule parses the following values from the input string teststring_123: teststring _123
rule_Parse_First_Word	Parses the first word in an input string to a field that the rule specifies.
rule_Parse_Number_At_End_Of_Line	Parses any number that occurs at the end of an input string to a field that the rule specifies. The rule reads strings from left to right.
rule_Parse_Number_At_Start_Of_Line	Parses any number that occurs at the start of an input string to a field that the rule specifies. The rule reads strings from left to right.
rule_Parse_Text_Between_Parentheses	Parses strings that are enclosed in parentheses to a field that the rule specifies. The rule contains an output field for the parsed strings and an output field for the input text without the parsed strings.
rule_Parse_Text_in_Single_Quotes	Parses strings that are enclosed in quotation marks to a field that the rule specifies. When the input data contains multiple quoted elements, the rule parses the final element. The rule reads the input strings from left to right. The rule contains an output field for the parsed strings and an output field for the input text without the parsed strings.
rule_Past_Date_Label	Determines whether an input date is earlier than the system date or later than the system date.
rule_Personal_Company_Identification	Parses person names and company names to different fields that the rule specifies. The rule has the following outputs: - Person name - Company name - Data category, such as person name or company name- - Data that the rule cannot parse
rule_Positive_Number_Validation	Verifies that the input data is a positive number.
rule_Prepend_Zero_to_Single_Digit	Prepends the numeral "0" to single numeric characters.
rule_Remove_All_Leading_Zeros	Removes all instances of the numeric character "0" from the beginning of a string.
rule_Remove_Apostrophe	Removes apostrophes. The rule merges the text strings on either side of the apostrophe.
rule_Remove_Control_Characters	Removes control characters from text strings. The rule returns a string that contains the control characters and a string that contains the input text without the control characters.
rule_Remove_Extra_Spaces	Replaces all consecutive spaces with a single space and trims leading and trailing spaces.
rule_Remove_Hyphen	Removes hyphens from anywhere in the input string.
rule_Remove_Leading_Zero	Removes a single instance of the numeric character "0" from the beginning of a string.
rule_Remove_Limited_Punctuation	Removes extraneous characters. Extraneous characters include slashes, back slashes, periods, exclamation marks, and underscores. The rule also replaces multiple consecutive spaces with a single space.
rule_Remove_Non_Numbers	Removes all characters that are not numeric.
rule_Remove_Parentheses	Removes right and left parenthesis symbols.
rule_Remove_Period	Remove periods.
rule_Remove_Period_Parentheses	Removes the following characters: - Left and right parentheses - Periods
rule_Remove_Punctuation	Removes punctuation symbols.
rule_Remove_Punctuation_and_Space	Removes all punctuation and all space characters.
rule_Remove_Quotation	Removes quotation marks.
rule_Remove_Slashes	Removes forward slashes and back slashes.
rule_Remove_Space	Removes all character spaces.
rule_Replace_Hyphen_with_Space	Replaces hyphens with spaces.
rule_Replace_Limited_Punct_with_Space	Replaces the following punctuation characters with a single space: dash, back slash, period, exclamation mark, and underscore The rule also replaces two, three, and four consecutive spaces with a single space.
rule_Replace_Non_Alphabetic_with_Space	Replaces numerals and punctuation characters with a single space.
rule_String_Completeness	Checks a string for completeness. The rule also searches the input strings for values in the reference table string_default_values_infa . The reference table contains values such as NA, DEFAULT, and XX. If an input string contains a value in the reference table, the rule identifies the string as incomplete.
rule_TitleCase	Converts strings to title case. In title case strings, the first letter of each word is capitalized.
rule_Translate_Diacritic_Characters	Replaces diacritic characters with ASCII equivalents. For example, the rule converts "ã" to "a".
rule_UpperCase	Returns all alphabetic characters in upper case. The input and output fields in the rule use a precision of 200.
rule_UpperCase1000	Returns alphabetic characters in upper case. The input and output fields in the rule use a precision of 1,000.
rule_USA_NER_Field_Identification	Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and United States address data. The rule returns a label that describes the type of input data. The rule uses reference data and probabilistic matching techniques to identify the types of information.
rule_Years_Since_Date_of_Birth	Calculates the number of years since the input date.

Rename Saved Search

Table of Contents

Using the Data Quality Accelerator for Crisis Response

Using the Data Quality Accelerator for Crisis Response

General Data Cleansing Rules

General Data Cleansing Rules