Table of Contents

Search

  1. Preface
  2. Introduction
  3. Definition File Overview
  4. Customization Steps
  5. Service Group Definition
  6. Algorithm Definition
  7. Edit-list Definition
  8. Matching Scheme Definition

Service Group Definition and Customization Guide

Service Group Definition and Customization Guide

Algorithm Definition

Algorithm Definition

An example for Algorithm Definition is as follows:
ALGORITHM-DEFINITION UNCONTROLLED NAME=PERSON * ****** AUTHORISED=N3AUUSP CLEANING=N3CN FORMATTING=N3FTEN STABILIZATION=N3STEN1 CHARSET=N3CS EDITLIST=N3ELUSP FREQUENCY-TABLE=N3TBUSP SCALER-TABLE=N3SCUSP (optional) NAME-LENGTH=50 ALTERNATE-KEYS=Y NAME-FORMAT=R KEY-FORMAT=15 * ....+....1....+....2....+....3 CLEANING-OPTIONS=NN * ....+....1....+....2....+....3 FORMATTING-OPTIONS=SWNNNNNNNNN * ....+....1....+....2....+....3 STABILIZATION-OPTIONS= * ....+....1....+....2....+....3 SSA-NAME3-OPTIONS=NCNNNNNNNNNNNNNNNNNNNN8NANN KEYS-STACK-SIZE=20 WORDS-STACK-SIZE=8 SEARCH-TABLE-SIZE=21 REPORT-SIZE=2000 ALGORITHM-DEFINITION
An Algorithm definition starts with this identifying line. An Algorithm can be in one of two states. It is either being developed and not suitable for use by the application, or, is in use by the application and should not be touched by the Generation programs. These states are controlled by the following options.
GENERATING NOT-GENERATING (or NON-GENERATING) UNCONTROLLED
If the algorithm is being developed and not yet authorized the option,
GENERATING
should be specified. In this case the Algorithm will be regarded as being inaccessible to the application. When using
GENERATING
the generation process will work, however, the resultant module will not be able to be linked into the Service Group for use by an application. It is also appropriate to use
GENERATING
when there are no plans to use this Algorithm. If the Algorithm is ready for the production environment and has been Authorized then the option,
NOT-GENERATING
or it’s alias
NON-GENERATING
should be specified. This will stop the Algorithm from being regenerated by the generation process and allow it to be linked into the Service Group. The following table summarizes the actions that can be taken when these options are defined. The programs column indicates which of the generation programs work with the option in question.
Option
Gen
Auth
Lnk
You are –
You may generate
GENERATING
Yes
Yes
No
Designing a new Algorithm – this may require several passes through the Generation process. While this is being done the Algorithm is not fit for general use, for this reason it cannot be linked into an application.
Everything except the User Service Group
NOT-GENERATING
No
No
Yes
Ready to apply a new Algorithm – the Generation and Authorization is OK. You now wish to link the Algorithm with your application and lock the Generation process so the modules cannot be changed.
Only the User Service Group
UNCONTROLLED
Yes
Yes
Yes
Living dangerously – This option is used as a shorthand method of allowing all steps to be taken. However, you have no protection against your production files being overwritten by a careless invocation of Generation.
Everything
NAME=
This directive defines the Algorithm. This is the name to be used in the Service definition and in the Schemes definition. For example
NAME=STANDARD
gives this algorithm the name STANDARD. The name must be eight characters or less in length.
AUTHORIZED= (or AUTHORISED=)
This defines the authorization module name. This module will be generated automatically and you should ensure that the name given is unique.
CLEANING=
FORMATTING=
STABILIZATION= (or STABILISATION=)
CHARSET=
These four options select the Cleaning, Formatting, Word Stabilization and Character-set table modules to be used. These along with the Edit-list, allow different countries and languages to be supported by SSA-NAME3. These modules are supplied with SSA-NAME3 and pre-defined in the Fast-start country Service Groups. For some countries there will be a choice of more than one Character-set tables or Stabilization routine. In the case of the Character-set tables, this will be because different code pages have been supported. In the case of the Stabilization routines, it may be because of support for different code pages or because alternative Stabilization routines have been developed to handle different levels of data quality. For example, poor data quality may require a more aggressive stabilization routine.
EDITLIST=
The name of the Edit-list to be used, for example:
EDITLIST=N3ELUSP
specifies that this Algorithm should use the
N3ELUSP
Edit-list.
FREQUENCY-TABLE=
The name of the frequency table module that will be created by the generation process, for example:
FREQUENCY-TABLE=N3TBUSP
SCALER-TABLE=
The name of the Scaler Frequency table module that will be created by the generation process, for example:
SCALER-TABLE=N3SCUSP
. This is optional, and only needs to be defined and generated if you use the Scale value returned in the NAMESET search ranges.
NAME-LENGTH=
The length of the names passed to SSA-NAME3 is defined by this directive. Values between 10 and 255 can be used.
ALTERNATE-KEYS=
If you want alternate keys (that is, multiple Positive or Negative keys) to be generated then specify
Y
for this option, for example,
ALTERNATE-KEYS=Y
otherwise specify,
ALTERNATE-KEYS=N
For more information on Alternate keys, see the
Factors which Determine the Number of Keys to Generate per Name
section.
NAME-FORMAT=
If your names have the major word (e.g. surname) on the right (at the end) then specify,
NAME-FORMAT=R
for this option, otherwise specify,
NAME-FORMAT=L
For more information on Name Format, see the
Factors which Determine the Format of a Name
section.
KEY-FORMAT=
This controls the gross compression techniques used to build the name key. The syntax is as follows,
KEY-FORMAT=[option]
where
[option]
is one of the values in the following table.
Opt
Description
15
Option 15 produces keys which cause common names to be more selective than uncommon names. This is the default method.
Uncommon words are stabilized, de-voweled and truncated while common words are only stabilized. Uncommon words can thus sustain greater variations in the tail end of the word than common words and common names achieve a greater selectivity than if treated the same as uncommon.
17
This option causes common and uncommon names to be encoded in keys in the same manner. This will reduce the overall selectivity of the keys.
This method causes both common and uncommon words to be stabilized, devoweled and truncated (in the same way that uncommon words are treated using
KEY-FORMAT=15
). The benefit of this is that the effects of misspellings or variations in common names is then the same as in uncommon names. The cost is reduced selectivity on common names (that is, more variations of common names will reduce to the same key resulting in more records being selected in a search). This has fairly specific application and is not recommended for general use.
Advice should be sought from Informatica Corporation technical support before changing this option from the default (
KEY-FORMAT=15
).
If you change this option you will have to regenerate your Frequency Table (and Scaler Frequency Table if used) and reload keys into the database.
KEYS-STACK-SIZE=
Define the number of entries (maximum 99) to be placed in the Keys-stack, for example,
KEYS-STACK-SIZE=20
will allow up to 20 entries. This keyword is optional, its omission causes the default of 20 to be used.
WORDS-STACK-SIZE=
Define the number of entries (maximum 99) to be placed in the Words-stack, for example,
WORDS-STACK-SIZE=8
will allow up to 8 entries. This keyword is optional, its omission causes the default of 8 to be used.
SEARCH-TABLE-SIZE=
Define the number of entries (maximum 99) to be placed in the Search-table, for example,
SEARCH-TABLE-SIZE=21
will allow up to 21 entries. This keyword is optional, its omission causes the default of 21 to be used.
REPORT-SIZE= (or EDIT-LIST-SIZE=)
This option tells the Word-Frequency Report generation phase how many of the most common words (in the file being analyzed) should be listed in the output report. Either form of the option can be specified.
EDIT-LIST-PAGE-SIZE=
This option tells the Edit-list generation phase the page size for the hashed Edit-list (internal structures which represent the Edit-list definitions). The default value of 1024 bytes is large enough for 100’s of thousands of words and shouldn’t need changing. Only users with extremely large Edit-lists may need to set this parameter.

0 COMMENTS

We’d like to hear from you!