The Unicode Standard is the work of the Unicode Consortium, an international body that promotes the interchange of data in all languages. The Unicode Standard is designed to support any language, no matter how many bytes each character in that language may require. Currently, it supports all common languages and provides limited support for other less common languages. The Unicode Consortium is continually enhancing the Unicode Standard with new character encodings. For more information about the Unicode Standard, see
http://www.unicode.org.
The Unicode Standard includes multiple character sets. Informatica uses the following Unicode standards:
UCS-2 (Universal Character Set, double-byte). A character set in which each character uses two bytes.
UTF-8 (Unicode Transformation Format). An encoding format in which each character can use between one to four bytes.
UTF-16 (Unicode Transformation Format). An encoding format in which each character uses two or four bytes.
UTF-32 (Unicode Transformation Format). An encoding format in which each character uses four bytes.
GB18030. A Unicode encoding format defined by the Chinese government in which each character can use between one to four bytes.
Informatica is a Unicode application. The
CDI-PC Client
, Data Integration Service, and
CDI-PC Integration Service
use UCS-2 internally. The
CDI-PC Client
converts user input from any language to UCS-2 and converts it from UCS-2 before writing to the
CDI-PC repository
. The
CDI-PC Integration Service
and Data Integration Service converts source data to UCS-2 before processing and converts it from UCS-2 after processing. The
CDI-PC repository
, Model repository, Data Integration Service, and
CDI-PC Integration Service
support UTF-8. You can use Informatica to process data in any language.