Search

User Guide

User Guide

10.5
- 10.5 HotFix 3
- 10.5 HotFix 2
- 10.5 HotFix 1
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

Clustering Definition

Clustering Definition

This section begins with the

Clustering-Definition

keyword. The fields are as follows:

Field	Description
NAME=	A character string which identifies the Clustering-Definition . The name must not match any Search-Definition nor Multi-Search-Definition names in the same Project. This is a mandatory parameter.
CLUSTERING-ID=	A unique two-character ID prefixed to all cluster numbers generated by this Clustering. This is a mandatory parameter. If the first Clustering definition is used for seeding and any subsequent Clusterings are adding to this seeded Clustering then all these Clusterings should use the same CLUSTERING-ID . See the CLUSTERING-METHOD=SEED section under User-Job-Definition for more information about seeding.
IDX=	The name of the IDX used by the clustering step. If this parameter is not given, then the default IDX name "kx" followed by the given CLUSTERING-ID is assumed. The IDX is defined in the IDX-definition section (see the IDX Definition / NAME= section.
INDEXES-PATH=	The path for Clustering index files. DCE does not support spaces in file or PATH names
FORMATTED-FILE-PATH=	Optional path for the formatted data file fmt.tmp . Refer to the Reformat Input Data section.
COMMENT=	An optional character string describing this clustering step.
SEARCH-LOGIC= (alias to KEY-LOGIC=)	This parameter describes the logic to be used to generate search ranges to find candidate records from the IDT. It may differ from the KEY-LOGIC= used to generate keys for the IDT (as defined in the IDX-Definition). Refer to the Search Logic section for details. This is a mandatory parameter.
SCORE-LOGIC	This parameter describes the normal matching logic used to refine the set of candidate records found by the Search-Logic . This is a mandatory parameter unless at least one of the other SCORE-LOGIC parameters is specified. Refer to the Score Logic section for details.
PRE-SCORE-LOGIC	This optional parameter describes the lightweight matching logic used to refine the set of candidate records found by the Search-Logic. Refer to the Score Logic section for details.
KEY-SCORE-LOGIC=	This optional parameter describes the normal matching logic used to refine the set of candidate records found by the Key-Logic . Refer to the Search Logic section for details.
KEY-PRE-SCORE-LOGIC=	This optional parameter describes the light-weight matching logic used to refine the set of candidate records found by the Key-Logic . Refer to the Search Logic section for details.
SORTED-FILE-PATH=	Optional path for the temporary sort data file srt.tmp .
SORT-WORK1-PATH=, SORT-WORK2-PATH=	DCE may create sort work files when sorting a large result set. These parameters control the placement of these files and override the values possibly given in the Project-Definition .
KEY-FIELD=	The name of the field in the database file which is to be used for key generation purposes. This must be a field defined in the File-Definition . It is recommended that any use of this keyword is reviewed and converted to use the newer Field(List of keyfields) Search-Logic/Key-Logic option. For more details, refer to the Search Logic section.
CANDIDATE-SET-SIZE-LIMIT=n	Informs the DCE Search Server to process searches by first building a list of candidate records, eliminating duplicates, and then scoring the remainder. This process usually makes scoring more efficient. n specifies the maximum number of unique entries in the list. The default limit is 10000 records. A value of 0 disables this processing.
SCHEDULE=<list of jobs>	Comma-separated list of jobs scheduled for this clustering. The jobs listed must be defined in the job-definition sections.
UNMATCHED-FILE=	The name of the Logical-File entity that describes the Unmatched File. This file is created when running a clustering job with the No-New-Clusters option. When this parameter is defined, the records that did not match any existing clusters are written to the Unmatched File. An output view may be used to format the output file.
OPTIONS=	A comma separated list of keywords used to control various search options: ADD-NULL-KEY SORTIT processing is used to sort the input file into preferred key order. If an input record generates a null key and the IDX-Definition option No-Null-Key has been specified, the record is not written to the output file and therefore will not be loaded into the IDT. If you wish this record to be loaded but do not want null keys added to the key index, specify the Add-Null-Key option. This is useful if the record will be reindexed later using a different field. APPEND the input file is appended to an existing file in the database. This option is used to merge two or more input files into one clustering database. Normally, the previous clustering information is erased when a file is loaded to the database. AUTO-ID generate unique record source Id in the Id-field. This option is needed if source identifiers (Source Id) will be used to identify records. IGNORE-NOTCH-OVERRIDE Ignore any adjustments made to the match levels by a search client (Relate or DupFinder) that requests a particular Match-Tolerance. The tolerance is honored but the adjustments are ignored. DELAY delay the building of cluster index 2 (used by POST ); if you use this option and you wish to run a POST job, you will need to schedule an EXTRACT job to create the index file. FORMAT run the PRE job to pre-format the raw input data. This option is needed if a job of type PRE is used. PRE-LOAD run the LOADIT job to preload the input data to the database. This option is needed if a job of type LOADIT is used. SEARCH-NULL-PARTITION any search for a record containing a Null-Partition value will search all other partitions. Any search for a record with a non-null partition value will search the null-partition as well. Note that the entire partition value must be null for this to work. SORT-IN run the SORTIT job to sort the input data. This option is needed if a job of type SORTIT is used. TRUNCATE-SET modifies the behavior of CANDIDATE-SET-SIZE-LIMIT . Searches normally continue until all candidates have been considered. Truncate-Set will terminate the search once the candidate set is full, thereby limiting the number of candidates that will be considered.

Watch

Comments

0 COMMENTS

We’d like to hear from you! Log in to comment.