Table of Contents

Search

  1. Preface
  2. Introduction
  3. Installation
  4. Design
  5. Operation

Batch Search Client - DupFinder

Batch Search Client - DupFinder

The DupFinder function is a batch search application designed to discover duplicate records within data previously loaded into the Clustering Database. It does so by using each record in the Clustering Database as a search transaction against the same database. It uses the nominated Search Definition to find duplicate records from the Clustering Database and writes the search results to a flat file.
Because every search transaction will have an identical record on the file, the report will display such matches unless the correct run-time option is used to remove the Source Record (see below).

Starting from the Console

DupFinder can be started from the Console Client by selecting
Tools
DupFinder
. This brings up the
DupFinder
options screen:
Field
Description
Output File
All duplicate records that have an acceptable score as determined by the Search Definition are written to the output file.
Search Definition
You must choose the Search Definition to be used from this drop-down list.
Search Width
If you have predefined search widths (
Narrow
,
Typical
or
Exhaustive
) you can choose one here. Otherwise, if left blank, the control defined in the relevant search is used.
Match Tolerance
If you have predefined match tolerances (
Conservative
,
Typical
or
Loose
) you can choose one here. Otherwise, if left blank, the control defined in the relevant search is used.
Output Format
Choose the output report format from here. Values 0 - 7 are valid and are described in the
Relate - Report Formats
section.
Starting Record ID
Enables commencement of the deduplication process at a nominated Record ID value.
Extra Options
This field can be used to enter extra command line switches supported by future versions of the DupFinder program. See the
Extra options for Relate and/or DupFinder
section below for more information.
Return Search Records Only
Only return the Search Record for which a match was found.
Remove Search Record
By implication of matching the same file against itself, the report will show matches caused by identical records. This is probably not desired so the search record can be hidden.
Append New Line
Append a newline to the output report after each record. This option has effect only on report formats 0, 1, 3, 4 and 6. Without specifying this option all the output records are written into a single line and the output should be treated as fixed length records.
Trim Trailing Blanks
Remove trailing blanks from each output record. This option has effect only on report formats 0, 3, 4 and 6. This option also implies
Append New Line
so that the boundaries between the output records are not lost.

0 COMMENTS

We’d like to hear from you!