Operations Guide

10.1 HotFix 1
- 10.5 HotFix 3
- 10.5 HotFix 2
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

Database Support for UNICODE

There are two main ways that databases support UNICODE characters:

Database Level

Some databases store all columns of all tables as UNICODE. This allows multiple database clients to use different character sets and have their data stored without loss since UNICODE is a superset of all client character sets.

Column Level

Some databases allow individual columns in a table to be defined as UNICODE, while others are not. The UNICODE data types are usually preceded by the letter ’N’ (for National). For example NCHAR, NVARCHAR, NCLOB, etc.

Oracle

Oracle Database

UNICODE support for Oracle databases may be implemented in two ways by defining:

the database character set as UTF8 so that UTF-8 encoded characters may be stored in all CHAR data types (CHAR, VARCHAR2, CLOB), or

individual columns as UNICODE data types (NCHAR, NVARCHAR2, NCLOB). This allows you to add UNICODE support incrementally for only some specific columns in your tables.

Oracle databases define two character sets when the database is created:

database character set (

NLS_CHARACTERSET

), and

the character set used for NCHAR or NVARCHAR columns (

NLS_NCHAR_CHARACTERSET

). Valid values are UTF8 or AL16UTF16.

The following

SQL*Plus

script can be used to determine how the database was configured:


select parameter, substr(value,1,20) from NLS_DATABASE_PARAMETERS;

PARAMETER 							 SUBSTR(VALUE,1,20)
------------------------------ -------------------------------
NLS_LANGUAGE 						        AMERICAN
NLS_TERRITORY
NLS_CURRENCY 						        $
NLS_ISO_CURRENCY
NLS_NUMERIC_CHARACTERS 			 .,
NLS_CHARACTERSET 			       UTF8
NLS_CALENDAR 						        GREGORIAN
NLS_DATE_FORMAT 				       DD-MON-RR
NLS_DATE_LANGUAGE 		       AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT 				       HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT		     DD-MON-RR HH.MI.SSXF
NLS_TIME_TZ_FORMAT 	       HH.MI.SSXFF AM TZH:T
NLS_TIMESTAMP_TZ_FORMAT 		 DD-MON-RR HH.MI.SSXF
NLS_DUAL_CURRENCY 				     $
NLS_COMP 							           BINARY
NLS_NCHAR_CHARACTERSET 			 UTF8
NLS_RDBMS_VERSION 				     8.1.7.0.0

18 rows selected.

Oracle Client

Although Oracle can store data as UNICODE characters, the client application may not be aware of this because data are converted upon retrieval. The environment variable NLS_LANG defines the character set of the database client. This character set is not necessarily UNICODE, although

UNICODE

is a valid option.

NLS_LANG

has the format

X_Y.Z

where

is the value of

NLS_LANGUAGE

is the value of

NLS_TERRITORY

, and

is the value of

NLS_CHARACTERSET

Multi-byte data in a non-UNICODE column

It is possible to store multi-byte characters in a non-UNICODE database and/or column. Data stored in

CHAR/VARCHAR

columns is normally translated between the client and server’s character sets when transferred between client and server. But if the client and database are configured to use the same character set, no conversion is performed. This makes it possible to store multi-byte characters within

CHAR/VARCHAR

columns without interference from the DBMS.

Microsoft SQL Server

A column defined as a non-UNICODE data type can only store a single code page (character set). The code page is determined by the collation of the column defined at table creation time, or if none was specified, the collation of the database.

Columns defined using UNICODE data types such as

NCHAR

and

NVARCHAR

can store/retrieve UNICODE characters. They always use an

UCS-2/UTF-16

encoding. MSQ database clients work directly with "raw" UNICODE characters, without translation to a client character set.

UDB

For UDB, the database must be created as a UNICODE database. By using code set utf-8, Unicode data will be stored in UTF-8 form.

The easiest way to check that this is the case is with the following command:

db2 get database config for mydb

The response will be something like:


					
						Database Configuration for Database mydb

Database configuration release level	 = 0x0a00
Database release level 					          = 0x0a00
Database territory 						             = AU
Database code page 						             = 1208
Database code set 						              = UTF-8
Database country/region code 			      = 61

The data file used by the load process will be in UTF-16 which will be converted to UTF-8 by UDB.

Rename Saved Search

Table of Contents

Operations Guide

Operations Guide

Database Support for UNICODE

Database Support for UNICODE

Oracle

Oracle Database

Oracle Client

Multi-byte data in a non-UNICODE column

Microsoft SQL Server

UDB