Table of Contents

Search

  1. Preface
  2. Introduction
  3. Servers
  4. Console Client
  5. Search Clients
  6. Table Loader
  7. Update Synchronizer
  8. Globalization
  9. Siebel Connector
  10. Web Services
  11. ASM Workbench
  12. Cluster Merge Rules
  13. Forced Link and Unlink
  14. System Backup and Restore
  15. Batch Utilities

Database Support for UNICODE

Database Support for UNICODE

There are two main ways that databases support UNICODE characters:
Database Level
Some databases store all columns of all tables as UNICODE. This allows multiple database clients to use different character sets and have their data stored without loss since UNICODE is a superset of all client character sets.
Column Level
Some databases allow individual columns in a table to be defined as UNICODE, while others are not. The UNICODE data types are usually preceded by the letter ’N’ (for National). For example NCHAR, NVARCHAR, NCLOB, etc.

Oracle

Oracle Database

UNICODE support for Oracle databases may be implemented in two ways by defining:
  • the database character set as UTF8 so that UTF-8 encoded characters may be stored in all CHAR data types (CHAR, VARCHAR2, CLOB), or
  • individual columns as UNICODE data types (NCHAR, NVARCHAR2, NCLOB). This allows you to add UNICODE support incrementally for only some specific columns in your tables.
Oracle databases define two character sets when the database is created:
  • database character set (
    NLS_CHARACTERSET
    ), and
  • the character set used for NCHAR or NVARCHAR columns (
    NLS_NCHAR_CHARACTERSET
    ). Valid values are UTF8 or AL16UTF16.
The following
SQL*Plus
script can be used to determine how the database was configured:
select parameter, substr(value,1,20) from NLS_DATABASE_PARAMETERS; PARAMETER SUBSTR(VALUE,1,20) ------------------------------ ------------------------------- NLS_LANGUAGE AMERICAN NLS_TERRITORY NLS_CURRENCY $ NLS_ISO_CURRENCY NLS_NUMERIC_CHARACTERS ., NLS_CHARACTERSET UTF8 NLS_CALENDAR GREGORIAN NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXF NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:T NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXF NLS_DUAL_CURRENCY $ NLS_COMP BINARY NLS_NCHAR_CHARACTERSET UTF8 NLS_RDBMS_VERSION 8.1.7.0.0 18 rows selected.

Oracle Client

Although Oracle can store data as UNICODE characters, the client application may not be aware of this because data are converted upon retrieval. The environment variable
NLS_LANG
defines the character set of the database client. This character set is not necessarily UNICODE, although
UNICODE
is a valid option.
NLS_LANG
has the format
X_Y.Z
where
X
is the value of
NLS_LANGUAGE
Y
is the value of
NLS_TERRITORY
, and
Z
is the value of
NLS_CHARACTERSET

Multi-byte data in a non-UNICODE column

It is possible to store multi-byte characters in a non-UNICODE database and/or column. Data stored in
CHAR/VARCHAR
columns is normally translated between the client and server’s character sets when transferred between client and server. But if the client and database are configured to use the same character set, no conversion is performed. This makes it possible to store multi-byte characters within
CHAR/VARCHAR
columns without interference from the DBMS.

Microsoft SQL Server

A column defined as a non-UNICODE data type can only store a single code page (character set). The code page is determined by the collation of the column defined at table creation time, or if none was specified, the collation of the database.
Columns defined using UNICODE data types such as
NCHAR
and
NVARCHAR
can store/retrieve UNICODE characters. They always use an
UCS-2/UTF-16
encoding. MSQ database clients work directly with "raw" UNICODE characters, without translation to a client character set.

UDB

For UDB, the database must be created as a UNICODE database. By using code set utf-8, Unicode data will be stored in UTF-8 form.
The easiest way to check that this is the case is with the following command:
db2 get database config for mydb
The response will be something like:
Database Configuration for Database mydb Database configuration release level = 0x0a00 Database release level = 0x0a00 Database territory = AU Database code page = 1208 Database code set = UTF-8 Database country/region code = 61
The data file used by the load process will be in UTF-16 which will be converted to UTF-8 by UDB.

0 COMMENTS

We’d like to hear from you!