Identity Resolution
- Identity Resolution 10.1
- All Products
Field | Description |
---|---|
NAME= | A character string identifying the job. This is a mandatory parameter. |
COMMENT= | This is a text field that is used to describe the Job’s purpose. |
IDX-LIST= | This is a comma-separated list of IDX names used in conjunction with the Load-All-Indexes option to limit the number of IDXs to be loaded. Normally Load-All-Indexes means that all IDXs that have been defined are to be loaded. |
FILE= | A parameter used to define name of the Logical-File entity which describes either an input or output file to be used by this job. |
TYPE={PRE|SORTIT|LOADIT|CLUSTER|EXTRACT|POST} | A character string that describes the type of job. Refer to the Clustering Suite section in the Introduction chapter for more details. This is a mandatory parameter. |
CLUSTERING-METHOD=method | Specifies how to assign records to clusters for a job of type CLUSTER . This setting is ignored for other job types. The parameter method is one of the following:
|
CHECKPOINT-TIME=n[s|m|h|d] | This parameter informs the Data Clustering Engine to enter a Wait state after clustering records for n seconds/minutes/hours/days. n is assumed to have units of seconds if it is not qualified by the optional s , m , h or d unit parameter. Refer to the Stopping and Restarting Clustering section for more information on how to use this parameter. |
STATUS-TIME=n[s|m|h|d] | This parameter informs the Data Clustering Engine to write a status report after clustering records for n seconds/minutes/hours/days. n is assumed to have units of seconds if it is not qualified by the optional s , m , h or d unit parameter. |
INPUT-SELECT=n, INPUT-SELECT=[Count(n),] [Skip(n),] [Sample(n)] | This parameter is used to define input file processing options. When specified in the first form above, the number n is treated as the number of records to be read from the input file. An equivalent method of specifying this is Count(n) . The value n must be a positive non-zero number. You may skip some records before processing begins by specifying Skip(n) . You may also process every nth record by specifying Sample(n) . Note that the INPUT-SELECT statement is ignored by the Cluster step if the data has been preloaded. In this case you can use the INPUT-SELECT statement in the LOADIT step. |
INPUT-HEADER= | Describes the number of bytes to ignore at the start of the input file. This is useful for some types of files that contain a fixed length header before the actual data records. |
OPTIONS= | A comma separated list of option keywords for the job:
|
CANDIDATE-SET-SIZE-LIMIT=n | Informs the CLUSTER step to process searches by building a list of candidate records, eliminating duplicates, and then scoring the remainder. n specifies the maximum number of unique entries in the list. The default limit is 10000 records. A value of 0 disables this processing. Any candidates that do not fit in the list generate an Audit Trail record of type Overflow . This process makes scoring more efficient, when candidates are found more than once. However, it can affect the clustering results if the Clustering-Method is sensitive to the order in which records are scored. Example, the BEST method will select the record with the best score, but if two or more records achieve the best score, then the first is selected. As deduping can reorder the records, a different record might be selected and the clustering result may differ over two otherwise identical runs. The TRUNCATE-SET option will terminate the search for candidates once the list becomes full. It is used to prevent very wide searches. However, if a search is terminated prematurely there is no guarantee that any of the candidates will be accepted and/or the best candidates have been found. |
CANDIDATE-SET-WARNING-LEVEL=n | This specifies a threshold value. If the set of candidate records is greater than or equal to this limit n , an Audit Trail record (type SetWarning ) is written. The default value is one quarter of the CANDIDATE-SET-SIZE-LIMIT . |
CANDIDATE-SET-REPORT-LIMIT=n | The cluster step will tabulate the number of records in each candidate set. This parameter n determines the size of the biggest set for which discrete counts will be maintained. At the end of the CLUSTER job, a histogram will be displayed (entitled Histogram: ranges - candidates count). The default value is equal to the CANDIDATE-SET-SIZE-LIMIT . |
OUTPUT-OPTIONS= | A comma separated list of options for the POST job.
|
pre | sortit | loadit | cluster | extract | post | |
---|---|---|---|---|---|---|
NAME | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
COMMENT | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
TYPE | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
FILE | ![]() | ![]() | ![]() | ![]() | ![]() | |
CLUSTERING-METHOD | ![]() | |||||
CHECKPOINT-TIME | ![]() | |||||
STATUS-TIME | ![]() | |||||
INPUT-SELECT | ![]() | ![]() | ![]() | |||
INPUT-HEADER | ![]() | ![]() | ![]() | |||
OPT=INPUT-APPEND | ![]() | |||||
OPT=NO-NEW-CLUSTERS | ![]() | |||||
OPT=RE-INDEX | ![]() | |||||
OPT=NO-ADD | ![]() | |||||
OPT=USE-ATTRIBUTES | ![]() | |||||
OPT=SET-ALL-VOTE | ![]() | |||||
OPT=SET-NONE-VOTE | ![]() | |||||
OPT=SET-HEADERS-VOTE | ![]() | |||||
OPT=STATUS-APPEND | ![]() | |||||
OPT=REVERSE-SORTIN | ![]() | |||||
CANDIDATE-SET-* | ![]() | |||||
OUTPUT-OPTIONS | ![]() | ![]() |