Table of Contents

Search

  1. Preface
  2. Introduction
  3. Program Design
  4. SSA-NAME3 Functions
  5. Language Specific Guidelines
  6. Controls
  7. Advanced Controls
  8. Address Standardization
  9. ASM Workbench

SSA-NAME3 API Reference Guide

SSA-NAME3 API Reference Guide

UNICODE_ENCODING

UNICODE_ENCODING

The
ssan3_get_keys_encoded
,
ssan3_get_ranges_encoded
and
ssan3_match_encoded
calls have a Field Data Type parameter that can be used to specify the encoding type. This control is only for use with the
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls. The section on UTF-8 considerations is relevant to all API function calls.
This Control is used in the
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls, and instructs SSA-NAME3 to accept Unicode data input.
Possible values are:
Encoding
Meaning
TEXT
Data is not in a Unicode encoding
UTF-8
Unicode UTF-8 format
UTF-16
Unicode UTF-16 format
UTF-16LE
Unicode UTF-16 format Little Endian
UTF-16BE
Unicode UTF-16 format Big Endian
UTF-32
Unicode UTF-32 format
UCS-2
Same as Unicode UTF-16 format
UCS-4
Unicode UTF-32 format
CP932
Japanese CP932 Codepage (shift-JIS)
CP936
Chinese CP936 Codepage (GBK or Simplified Chinese)
CP949
Korean CP949 Codepage
CP950
Chinese CP950 Codepage (Big5 or Traditional Chinese)
UTF8
All keywords can be specified without the hyphen (-)
DBCSHOST
Japanese - Host, DBCS
There are short forms for these values
Encoding
Meaning
Y
Unicode UTF-8 format
8
Unicode UTF-8 format
6
Unicode UTF-16 format
L
Unicode UTF-16LE format
B
Unicode UTF-16BE format
4
Unicode UCS-4 or UTF-32 format
J
Japanese CP932 codepage (Shift-JIS)
S
Chinese CP936 codepage (Simplified Chinese)
K
Korean CP949 codepage
T
Chinese CP950 codepage (Traditional Chinese)
D
Japanese - Host, DBCS
All call parameters use single byte except for Key Field Data in the
ssan3_get_keys
and
ssan3_get_ranges
calls and
Search Data
and
File Data
in the
ssan3_match
call.
When passing Unicode data, all length and offset values must be number of bytes, not number of characters. UTF8 is a variable length encoding so the number of characters represented will have varying lengths depending on the content of the data.
Scatter / Gather Data Format should be used except for UTF8 encoding in which case the following notes should be considered.

UTF8 Considerations

If the UNICODE UTF8 encoding is being used then as long as (great) care is taken, all parameters can be treated as Unicode. All return parameters are single character UTF8 except for data from the SSA-NAME3 v2 Info calls which may contain national characters.
For Tagged Data Format the UTF8 encoding can only be used if the delimiter character consists of the single byte UTF8 characters. All the standard (non-accented) characters (A-Z and a-z) and digits (0-9) and many punctuation characters are represented in UTF8 as a single 8 bit character.
In UTF8 if the character code is less than 128 (00 to 7F in hex) then it is a single byte character otherwise it could be a 2 to 6 byte character.
Everything between the delimiter characters will be passed unchanged.
It is also possible to use a multi-byte UTF8 character or characters for the
DELIMITER
.

0 COMMENTS

We’d like to hear from you!