The character set mapping provides a mapping between characters in source and destination character sets and thus enables conversion between character sets. Informatica Address Verification internally uses Unicode and externally supports multiple character sets including UTF-8, ISO 8859-1, GBK, BIG5, JIS, and EBCDIC.
Character sets use a numeric representation for each of the supported alphabets or characters. Typically, character sets use the same numeric representation for common alphabets or characters. However, some of the language-specific characters have different numeric representations across character sets.
For example, the letter A has the same numeric representation, 65, in both Unicode and Latin character sets. However, the letter Å has different representations in Unicode and Latin character sets. Å is represented by 143 in Unicode and 197 in Latin character sets. Characters that have different numeric representations across character sets fail to appear correctly when you use different character sets to render the data.
To render character sets, Address Verification first converts the input character strings to Unicode. Then it uses the corresponding mapping of the destination character set to render the data with near perfection. If no representation is available for a character in the destination character set, Address Verification maps that character to an underscore character.