The Unicode Standard is the work of the Unicode Consortium, an international body that promotes the interchange of data in all languages. The Unicode Standard is designed to support any language, no matter how many bytes each character in that language may require. Currently, it supports all common languages and provides limited support for other less common languages. The Unicode Consortium is continually enhancing the Unicode Standard with new character encodings. For more information about the Unicode Standard, see http://www.unicode.org.
The Unicode Standard includes multiple character sets. Informatica uses the following Unicode standards:
UCS-2 (Universal Character Set, double-byte). A character set in which each character uses two bytes.
UTF-8 (Unicode Transformation Format). An encoding format in which each character can use between one to four bytes.
UTF-16 (Unicode Transformation Format). An encoding format in which each character uses two or four bytes.
UTF-32 (Unicode Transformation Format). An encoding format in which each character uses four bytes.
GB18030. A Unicode encoding format defined by the Chinese government in which each character can use between one to four bytes.
Informatica is a Unicode application. The PowerCenter Client, PowerCenter Integration Service, and Data Integration Service use UCS-2 internally. The PowerCenter Client converts user input from any language to UCS-2 and converts it from UCS-2 before writing to the PowerCenter repository. The PowerCenter Integration Service and Data Integration Service converts source data to UCS-2 before processing and converts it from UCS-2 after processing. The PowerCenter repository, Model repository, PowerCenter Integration Service, and Data Integration Service support UTF-8. You can use Informatica to process data in any language.