Compatibility between code pages is essential for accurate data movement when the PowerCenter Integration Service runs in the Unicode data movement mode.
A code page can be compatible with another code page, or it can be a subset or a superset of another:
Compatible. Two code pages are compatible when the characters encoded in the two code pages are virtually identical. For example, JapanEUC and JIPSE code pages contain identical characters and are compatible with each other. The PowerCenter repository and PowerCenter Integration Service process can each use one of these code pages and can pass data back and forth without data loss.
Superset. A code page is a superset of another code page when it contains all the characters encoded in the other code page and additional characters not encoded in the other code page. For example, MS Latin1 is a superset of US-ASCII because it contains all characters in the US-ASCII code page.
Informatica considers a code page to be a superset of itself and all other compatible code pages.
Subset. A code page is a subset of another code page when all characters in the code page are also encoded in the other code page. For example, US-ASCII is a subset of MS Latin1 because all characters in the US-ASCII code page are also encoded in the MS Latin1 code page.
For accurate data movement, the target code page must be a superset of the source code page. If the target code page is not a superset of the source code page, the PowerCenter Integration Service may not process all characters, resulting in incorrect or missing data. For example, Latin1 is a superset of US-ASCII. If you select Latin1 as the source code page and US-ASCII as the target code page, you might lose character data if the source contains characters that are not included in US-ASCII.
When you install or upgrade a PowerCenter Integration Service to run in Unicode mode, you must ensure code page compatibility among the domain configuration database, the Administrator tool, PowerCenter Clients, PowerCenter Integration Service process nodes, the PowerCenter repository, the Metadata Manager repository, and the machines hosting
pmrep
and
pmcmd
. In Unicode mode, the PowerCenter Integration Service enforces code page compatibility between the PowerCenter Client and the PowerCenter repository, and between the PowerCenter Integration Service process and the PowerCenter repository. In addition, when you run the PowerCenter Integration Service in Unicode mode, code pages associated with sessions must have the appropriate relationships:
For each source in the session, the source code page must be a subset of the target code page. The PowerCenter Integration Service does not require code page compatibility between the source and the PowerCenter Integration Service process or between the PowerCenter Integration Service process and the target.
If the session contains a Lookup or Stored Procedure transformation, the database or file code page must be a subset of the target that receives data from the Lookup or Stored Procedure transformation and a superset of the source that provides data to the Lookup or Stored Procedure transformation.
If the session contains an External Procedure or Custom transformation, the procedure must pass data in a code page that is a subset of the target code page for targets that receive data from the External Procedure or Custom transformation.
Informatica uses code pages for the following components:
Domain configuration database. The domain configuration database must be compatible with the code pages of the PowerCenter repository and Metadata Manager repository.
Administrator tool. You can enter data in any language in the Administrator tool.
PowerCenter Client. You can enter metadata in any language in the PowerCenter Client.
PowerCenter Integration Service process. The PowerCenter Integration Service can move data in ASCII mode and Unicode mode. The default data movement mode is ASCII, which passes 7-bit ASCII or 8-bit ASCII character data. To pass multibyte character data from sources to targets, use the Unicode data movement mode. When you run the PowerCenter Integration Service in Unicode mode, it uses up to three bytes for each character to move data and performs additional checks at the session level to ensure data integrity.
PowerCenter repository. The PowerCenter repository can store data in any language. You can use the UTF-8 code page for the PowerCenter repository to store multibyte data in the PowerCenter repository. The code page for the PowerCenter repository is the same as the database code page.
Metadata Manager repository. The Metadata Manager repository can store data in any language. You can use the UTF-8 code page for the Metadata Manager repository to store multibyte data in the repository. The code page for the repository is the same as the database code page.
Sources and targets. The sources and targets store data in one or more languages. You use code pages to specify the type of characters in the sources and targets.
PowerCenter command line programs. You must also ensure that the code page for
pmrep
is a subset of the PowerCenter repository code page and the code page for
pmcmd
is a subset of the PowerCenter Integration Service process code page.
Most database servers use two code pages, a client code page to receive data from client applications and a server code page to store the data. When the database server is running, it converts data between the two code pages if they are different. In this type of database configuration, the PowerCenter Integration Service process interacts with the database client code page. Thus, code pages used by the PowerCenter Integration Service process, such as the PowerCenter repository, source, or target code pages, must be identical to the database client code page. The database client code page is usually identical to the operating system code page on which the PowerCenter Integration Service process runs. The database client code page is a subset of the database server code page.
For more information about specific database client and server code pages, see your database documentation.