Using the Data Quality Accelerator for Crisis Response

Using the Data Quality Accelerator for Crisis Response

General Data Cleansing Rules

General Data Cleansing Rules

Use the general data cleansing rules to parse, standardize, and validate data. Find the general data cleansing rules in the following repository location:
[Project_Name]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules:
Name
Description
rule_Assign_DQ_Mailability_Score_Description
Assigns a description to the Mailability Score output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_Assign_DQ_Match_Code_Description
Assigns a description to the Match Code output from the Address Validator transformation. The description corresponds to the output from Data Quality transformations in releases prior to Data Quality 9.0.
rule_CAN_NER_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and Canadian address data. The rule returns a label that describes the type of input data. The rule uses reference data to identify the types of information. The rule uses probabilistic matching techniques to identify the types of information.
rule_Compare_Dates
Calculates the difference between two dates. The rule uses the following units of measure: - Hours - Days - Months - Years Each output value is exclusive from the other values. The outputs cannot be added to represent the difference between the data values.
rule_Completeness
Checks a single field for NULL values. Returns "Complete" if the field contains data. Returns "Incomplete" if the field is empty or contains a NULL value.
rule_Completeness_Multi_Port
Checks multiple fields for NULL values. Returns "Complete" if all fields contain data. Returns "Incomplete" if any field is empty or contains a NULL value.
rule_Date_Complete
Verifies that the input string conforms to a date format that the rule recognizes. The rule reads the following reference table: user_defined_dates_infa
rule_Date_of_Birth_Validation
Checks the number of years between a date of birth and the current date. Returns "Adult" or "Minor" in addition to "Valid" if the number of years is 120 or lower. Returns "Invalid" if the number of years is greater than 120.
rule_Date_Parse
Parses date data from a string to a field that the rule specifies. The rule recognizes dates in the following formats: - dd/mm/yyyy - mm/dd/yyyy - yyyy/dd/mm The rule returns a date and also returns a string that contains the input text without the date.
rule_Date_Standardization
Standardizes date strings to an output format that you specify. To set the output format, open the dq_FormatDate Expression transformation in the rule and update the Output_Date_Format expression variable and the Delimiter expression variable. If the input data does not describe a valid date, the rule returns the digit 0 for each input character.
rule_Date_Validation
Validates date strings that appear in a single format in a data column. To configure the date format that the rule uses for validation, open the dq_ValidateDate Expression transformation in the rule and update the In_Date_Format expression variable. The default format is "MM/DD/YYYY." The rule returns "Valid" or "Invalid."
rule_Date_Validation_Variable_Format
Validates date strings that appear in multiple formats in a data column. Use the rule when a data source includes the following columns: - A column that contains date values in multiple formats. - A column that identifies the format of the date value in each row. If the column does not identify a date format for a row, the rule applies the format "MM/DD/YYYY" to the date value. The rule reads all data values that the is_date() function recognizes. The rule returns "Valid" or "Invalid."
rule_Days_Between_Dates
Calculates the number of days between two dates.
rule_Days_From_Current_Date
Calculates the number of days between a specified date and the current date.
rule_Field_North_American_Data
Identifies the following types of fields: name, occupation title, company, address, city, state or province, postcode, country, personal ID, email, telephone, credit card, and date. The rule generates a score that indicates the degree of confidence in the field identification. Higher scores indicate greater levels of confidence. If the rule cannot assign a field type, the rule writes the data on the Out_Undetermined field.
rule_IsNumeric
Verifies that the input data is numeric. The rule returns "True" or "False."
rule_LowerCase
Returns all alphabetic characters in lower case.
rule_Negative_Number_Validation
Validates that the input data is a negative number.
rule_Numeric_Completeness
Checks for NULL values in numeric inputs.
rule_Parse_Alpha_Chars_from_Non_Alpha_Chars
Identifies the alphabetic characters and the non-alphabetic characters in an input string and writes each set of characters to different output fields. For example, the rule parses the following values from the input string teststring_123: teststring _123
rule_Parse_First_Word
Parses the first word in an input string to a field that the rule specifies.
rule_Parse_Number_At_End_Of_Line
Parses any number that occurs at the end of an input string to a field that the rule specifies. The rule reads strings from left to right.
rule_Parse_Number_At_Start_Of_Line
Parses any number that occurs at the start of an input string to a field that the rule specifies. The rule reads strings from left to right.
rule_Parse_Text_Between_Parentheses
Parses strings that are enclosed in parentheses to a field that the rule specifies. The rule contains an output field for the parsed strings and an output field for the input text without the parsed strings.
rule_Parse_Text_in_Single_Quotes
Parses strings that are enclosed in quotation marks to a field that the rule specifies. When the input data contains multiple quoted elements, the rule parses the final element. The rule reads the input strings from left to right. The rule contains an output field for the parsed strings and an output field for the input text without the parsed strings.
rule_Past_Date_Label
Determines whether an input date is earlier than the system date or later than the system date.
rule_Personal_Company_Identification
Parses person names and company names to different fields that the rule specifies. The rule has the following outputs: - Person name - Company name - Data category, such as person name or company name- - Data that the rule cannot parse
rule_Positive_Number_Validation
Verifies that the input data is a positive number.
rule_Prepend_Zero_to_Single_Digit
Prepends the numeral "0" to single numeric characters.
rule_Remove_All_Leading_Zeros
Removes all instances of the numeric character "0" from the beginning of a string.
rule_Remove_Apostrophe
Removes apostrophes. The rule merges the text strings on either side of the apostrophe.
rule_Remove_Control_Characters
Removes control characters from text strings. The rule returns a string that contains the control characters and a string that contains the input text without the control characters.
rule_Remove_Extra_Spaces
Replaces all consecutive spaces with a single space and trims leading and trailing spaces.
rule_Remove_Hyphen
Removes hyphens from anywhere in the input string.
rule_Remove_Leading_Zero
Removes a single instance of the numeric character "0" from the beginning of a string.
rule_Remove_Limited_Punctuation
Removes extraneous characters. Extraneous characters include slashes, back slashes, periods, exclamation marks, and underscores. The rule also replaces multiple consecutive spaces with a single space.
rule_Remove_Non_Numbers
Removes all characters that are not numeric.
rule_Remove_Parentheses
Removes right and left parenthesis symbols.
rule_Remove_Period
Remove periods.
rule_Remove_Period_Parentheses
Removes the following characters: - Left and right parentheses - Periods
rule_Remove_Punctuation
Removes punctuation symbols.
rule_Remove_Punctuation_and_Space
Removes all punctuation and all space characters.
rule_Remove_Quotation
Removes quotation marks.
rule_Remove_Slashes
Removes forward slashes and back slashes.
rule_Remove_Space
Removes all character spaces.
rule_Replace_Hyphen_with_Space
Replaces hyphens with spaces.
rule_Replace_Limited_Punct_with_Space
Replaces the following punctuation characters with a single space: dash, back slash, period, exclamation mark, and underscore The rule also replaces two, three, and four consecutive spaces with a single space.
rule_Replace_Non_Alphabetic_with_Space
Replaces numerals and punctuation characters with a single space.
rule_String_Completeness
Checks a string for completeness. The rule also searches the input strings for values in the reference table
string_default_values_infa
. The reference table contains values such as NA, DEFAULT, and XX. If an input string contains a value in the reference table, the rule identifies the string as incomplete.
rule_TitleCase
Converts strings to title case. In title case strings, the first letter of each word is capitalized.
rule_Translate_Diacritic_Characters
Replaces diacritic characters with ASCII equivalents. For example, the rule converts "ã" to "a".
rule_UpperCase
Returns all alphabetic characters in upper case. The input and output fields in the rule use a precision of 200.
rule_UpperCase1000
Returns alphabetic characters in upper case. The input and output fields in the rule use a precision of 1,000.
rule_USA_NER_Field_Identification
Identifies the type of information that an input field contains. The rule can identify names, personal IDs, company names, dates, and United States address data. The rule returns a label that describes the type of input data. The rule uses reference data and probabilistic matching techniques to identify the types of information.
rule_Years_Since_Date_of_Birth
Calculates the number of years since the input date.

0 COMMENTS

We’d like to hear from you!