SSA-NAME3 API Reference Guide

10.1 HotFix 1
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

UNICODE_ENCODING

The

ssan3_get_keys_encoded

ssan3_get_ranges_encoded

and

ssan3_match_encoded

calls have a Field Data Type parameter that can be used to specify the encoding type. This control is only for use with the

ssan3_get_keys

ssan3_get_ranges

and

ssan3_match

function calls. The section on UTF-8 considerations is relevant to all API function calls.

This Control is used in the

ssan3_get_keys

ssan3_get_ranges

and

ssan3_match

function calls, and instructs SSA-NAME3 to accept Unicode data input.

Possible values are:

Encoding	Meaning
TEXT	Data is not in a Unicode encoding
UTF-8	Unicode UTF-8 format
UTF-16	Unicode UTF-16 format
UTF-16LE	Unicode UTF-16 format Little Endian
UTF-16BE	Unicode UTF-16 format Big Endian
UTF-32	Unicode UTF-32 format
UCS-2	Same as Unicode UTF-16 format
UCS-4	Unicode UTF-32 format
CP932	Japanese CP932 Codepage (shift-JIS)
CP936	Chinese CP936 Codepage (GBK or Simplified Chinese)
CP949	Korean CP949 Codepage
CP950	Chinese CP950 Codepage (Big5 or Traditional Chinese)
UTF8	All keywords can be specified without the hyphen (-)
DBCSHOST	Japanese - Host, DBCS

There are short forms for these values

Encoding	Meaning
Y	Unicode UTF-8 format
8	Unicode UTF-8 format
6	Unicode UTF-16 format
L	Unicode UTF-16LE format
B	Unicode UTF-16BE format
4	Unicode UCS-4 or UTF-32 format
J	Japanese CP932 codepage (Shift-JIS)
S	Chinese CP936 codepage (Simplified Chinese)
K	Korean CP949 codepage
T	Chinese CP950 codepage (Traditional Chinese)
D	Japanese - Host, DBCS

All call parameters use single byte except for Key Field Data in the

ssan3_get_keys

and

ssan3_get_ranges

calls and

Search Data

and

File Data

in the

ssan3_match

call.

When passing Unicode data, all length and offset values must be number of bytes, not number of characters. UTF8 is a variable length encoding so the number of characters represented will have varying lengths depending on the content of the data.

Scatter / Gather Data Format should be used except for UTF8 encoding in which case the following notes should be considered.

UTF8 Considerations

If the UNICODE UTF8 encoding is being used then as long as (great) care is taken, all parameters can be treated as Unicode. All return parameters are single character UTF8 except for data from the SSA-NAME3 v2 Info calls which may contain national characters.

For Tagged Data Format the UTF8 encoding can only be used if the delimiter character consists of the single byte UTF8 characters. All the standard (non-accented) characters (A-Z and a-z) and digits (0-9) and many punctuation characters are represented in UTF8 as a single 8 bit character.

In UTF8 if the character code is less than 128 (00 to 7F in hex) then it is a single byte character otherwise it could be a 2 to 6 byte character.

Everything between the delimiter characters will be passed unchanged.

It is also possible to use a multi-byte UTF8 character or characters for the

DELIMITER

Advanced Controls

Download Guide

Watch

Comments

Communities

Knowledge Base

Success Portal