Populations and Controls

Populations and Controls

Advanced Controls

Advanced Controls

The following section explains about the more advanced SSA-NAME3 Controls.

UNICODE_ENCODING

The
ssan3_get_keys_encoded
,
ssan3_get_ranges_encoded
and
ssan3_match_encoded
calls have a Field Data Type parameter that can be used to specify the encoding type. You can use this control with the
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls. The section on UTF-8 considerations is relevant to all API function calls.
This Control is used in the
ssan3_get_keys
,
ssan3_get_ranges
and
ssan3_match
function calls, and instructs MDM Registry Edition to accept Unicode data input. It can also be used to specify non-Unicode encodings as described in the following table.
Value
Description
Short form
TEXT
Data is not in Unicode encoding
UTF-8 or UTF8
Unicode UTF-8 format
Y or 8
UTF-16 or UTF16
Unicode UTF-16 format
6
UTF-16LE or UTF16LE
Unicode UTF-16 format Little Endian
L
UTF-16BE or UTF16BE
Unicode UTF-16 format Big Endian
B
UTF-32 or UTF32
Unicode UTF-32 format
4
UCS-2 or UCS2
Same as Unicode UTF-16 format
UCS-4 or UCS4
Unicode UTF-32 format
4
CP932
Japanese CP932 code page (shift-JIS)
J
CP936
Chinese CP936 code page (GBK or Simplified Chinese)
S
CP949
Korean CP949 code page
K
CP950
Chinese CP950 code page (Big5 or Traditional Chinese)
T
CP300
DBCSHost Japanese
D
DBCSHOST
DBCSHost Japanese
EUC
Extended UNIX Code
E
If the Field Data Type is W, the UTF-16 Unicode encoding is automatically set.
All call parameters use single-byte except for
Key Field Data
in the
ssan3_get_keys
and
ssan3_get_ranges
calls and
Search Data
and
File Data
in the
ssan3_match
call.
When passing Unicode data, all length and offset values must be number of bytes, not number of characters. UTF-8 is a variable length encoding so the number of characters represented will have varying lengths based on the content of the data.
Scatter/Gather Data Format must be used except for UTF-8 encoding in which case the following notes must be considered.

UTF8 Considerations

If the UNICODE UTF-8 encoding is being used and care is taken, all parameters can be treated as Unicode.
All return parameters are single character UTF-8 except for data from the MDM Registry Edition Info calls which might contain national characters.
For the Tagged Data Format, the UTF-8 encoding can be used if the delimiter character consists of the single-byte UTF-8 characters. All the standard (unaccented) characters (A-Z and a-z) and digits (0-9) and many punctuation characters are represented in UTF-8 as a single 8-bit character.
In UTF-8, if the character code is less than 128 (00 to 7F in hexadecimal), it is a single-byte character. Otherwise, it could be a 2-byte to 6-byte character.
Everything between the delimiter characters will be passed unchanged.
It is also possible to use a multibyte UTF-8 character or characters for the
DELIMITER
.

PURPOSE <expression>

The
PURPOSE
Control specifies the name of the Matching Purpose to use in the Match call.
It takes the following form:
PURPOSE=(<expression>)
Where
<expression>
is one of the following formats:
<expression> := <Purpose_Name> <expression> := <Purpose_Name>(<Match_level>) <expression> := not <expression> <expression> := <expression> or <expression> <expression> := <expression> and <expression> <expression> := (<expression>)
The
<expression>
requires parentheses () whenever there are embedded spaces, especially when you use
not
,
and
, and
or
.
The simple form of the Purpose
<expression>
:
PURPOSE=<Purpose_Name>
is also the most commonly used format. For example:
PURPOSE=Address
will cause a match to be done on the supplied Address fields to determine the match purpose "same address."
The form of the
<expression>
:
PURPOSE=<Purpose_Name>(<Match_level>)
For example:
PURPOSE=Address(Conservative)
is the same as specifying:
PURPOSE=Address MATCH_LEVEL=Conservative
However, if both ways of specifying a Match Level are used, the value specified locally associated
with each of the purposes will take precedence over the match level specified by the
MATCH_LEVEL
keyword. Purposes that do not have an individual match level specified will adopt the match level specified by the
MATCH_LEVEL
keyword.
Combining multiple expressions, gives the application designer more flexibility in choosing the method in which match decisions and scores are computed.

Multi-Purpose Matching

Another use for combining multiple expressions will be to achieve Multi-Purpose Matching. When multiple Purposes are used, it is important to note the following:
  • The Multi-Purpose expression is evaluated in a strict left to right order.
  • Early exit from the match process is possible after evaluation of the first Purpose.
All Purposes in the expression share the same data as passed in the
Search Data
and
File Data
fields.
If two Purposes share the same field and both Purposes must be evaluated, the field is evaluated twice. It is because the field might have been defined with different match options based on which Purpose it is in.
One example of Multi-Purpose matching is to create an early exit condition that is likely to increase performance. This would be either an early "accept" where Purposes are joined by an
OR
, or an early "reject" in the case of an
AND
. Again, note that early exit from the match process is possible after evaluation of the first Purpose.
For example, if in a typical Resident purpose, it is logically possible to reject on the basis of "Address" not passing a Conservative match. The following expression might be used:
PURPOSE=(Address(Conservative) AND Resident(Typical))
If the Address purpose receives a "Reject" decision from the Conservative match, the Resident Purpose is not evaluated as the AND has failed. However, if the Address purpose does not result in a Reject condition, the Resident purpose is thus evaluated. Note that the address data will be rematched as part of the Resident purpose, in addition to the
Person_Name
field.
In this example, overall performance of an online system or the run-time of a batch job improves, the more often early exit occurs. Conversely, performance can decrease the more often both Purposes need to be evaluated (because the address field must be evaluated for each Purpose).
Another example of Multi-Purpose matching is to return a superset of matches, such as:
PURPOSE=(Individual OR Resident)
This expression accepts the matches where the same Individual (Name + (Date of Birth or ID Number)) or the same Resident (Name + Address) is present. In this example, the
Match_Level
Control will be used to apply to both Purposes.
A third example is for mixing Match Levels. For example:
PURPOSE=(Contact(Typical)OR
Wide_Contact(Conservative))
This expression accepts matches if either
Person_Name, Organization_Name
and
Address
match at the
Typical
level, or
Person_Name
and
Organization_Name
match at the
Conservative
level.
In the previous examples, again note the performance impact of both Purposes being evaluated. It is based on the fact that certain fields would be evaluated twice).
When combining Purposes with
<and, or, not>
, the Purposes are evaluated in a left-to-right order. If the score/decision from the first Purpose invalidates the expression, no further processing is done. For more information about the score/decision processing, see the
Design Guide
.

Lightweight Matching

Lightweight matching uses a fast score estimate to reject the obvious mismatches. SSA-NAME3 performs full scoring for the remaining records, which results in improved performance.
Use the following controls to configure lightweight matching:
LWM=Y/N/ONLY
Enables or disables lightweight matching. Use the value
Y
to enable lightweight matching. Lightweight matching uses a fast score estimate to reject the obvious mismatches. The records that lightweight matching passes go to the full scoring for robust scoring and ranking. SSA-NAME3 returns the full score and the decision to the caller.
If you create system definition files by using the SDF Wizard, the lightweight matching is enabled by default.
Use the value
N
to disable lightweight matching. SSA-NAME3 matching performs full scoring on all the matching records.
Use the value
ONLY
to enable lightweight matching and disable full scoring. Lightweight matching returns the estimate as the final score to the caller.
LWM_FIELDS
Specifies the fields to which you want to apply lightweight matching and their weights. These values override the values that you have defined in the match purpose during the run time. Based on the lightweight matching scores, SSA-NAME3 rejects the obvious mismatches. If you do not set any value, SSA-NAME3 retrieves the fields from the match purpose and assigns equal weight to them.
The syntax of the LWM_FIELDS control is as follows:
LWM_FIELDS=<field1>,<weight1>[,...,<fieldn>,<weightn>]
where
field
is a valid field name that you have defined in the Purpose control, and
weight
is the relative significance of the specified field (0-100) when compared to the other fields.
For example,
LWM_FIELDS=Person_Name,5,Address_Part1,1
Lightweight matching is useful when you apply it to the fields that have low variations such as addresses. Lightweight matching is not efficient for the fields with high variations, where SSA-NAME3 handles the variations through Edit-list, and lightweight matching might incorrectly reject the records.
LWM_LIMIT
Specifies the accept and reject limits for the lightweight matching score. Based on the limits, SSA-NAME3 accepts or rejects the search results.
The syntax of the LWM_LIMIT control is as follows:
LWM_LIMIT=<Reject>[,<Accept>]
where
Reject
and
Accept
are the integer values ranging from 0 through 100.
For example,
LWM_LIMIT=50,90
If
LWM=N
, the
LWM_LIMIT
control has no effect.
If
LWM=Y
, SSA-NAME3 rejects the lightweight matching scores that are less than the reject limit. The accept limit has no effect, and you can omit it.
If
LWM=ONLY
, SSA-NAME3 rejects the lightweight matching scores that are less than the reject limit. It accepts the scores that are greater than the accept limit. It marks the scores of the records that are greater than or equal to the reject limit and less than the accept limit as undecided.
The default reject limit is 65, and the default accept limit is 90. If you have not set the accept limit and the reject limit is greater than 90, the accept limit is equal to the reject limit.

0 COMMENTS

We’d like to hear from you!