Table of Contents

Search

  1. Preface
  2. Introduction
  3. Defining a System
  4. Flattening IDTs
  5. Link Tables
  6. Loading a System
  7. Static Clustering
  8. Simple Search
  9. Search Performance
  10. Miscellaneous Issues
  11. Limitations
  12. Error Messages

relperf

relperf

The
relperf
utility is used to generate comparative performance statistics for a specified Search using a range of search strategies (search widths and match tolerances). By comparing the number of candidates selected with the number of accepted matches,
relperf
helps to determine the most appropriate strategy for a particular search problem.
Given a representative set of search data,
relperf
runs multiple search processes using all available search widths and match tolerances and then collates and summarizes the results.
The user specifies the search statistics to be reported by defining an output view that includes special statistical fields generated by the Search Server. Refer to the Output Views section in this guide for a list of statistical fields that are available.
Although statistical fields may be written in any numeric format to an output view, the report file will only summarize the statistics for fields that have a field type of ’R’. Therefore the statistical fields to be summarized in the report must have a format of R. Specify a length for the field that is large enough to handle the number of rows processed.
R,10
is adequate for most situations.
By default the report produced is tab delimited, which is suitable for importing into a spreadsheet. Use the
-t
switch to generate a report file that uses spaces instead of tabs.

Starting from the Command Line

relperf
can be started from the command line as follows:
For Win32, type the following:
%SSABIN%\relperf Search Infile Outfile OutputView-rRulebase-pSystem -hHost:Port -wWorkDir[Optional Switches]
For Unix, type the following:
$SSABIN/relperf Search Infile Outfile OutputView-rRulebase -pSystem -hHost:Port[Optional Switches]
The values passed in the command line are described in the following table:
Parameter
Description
Mandatory
Search
Nominates the Search Definition to use. If multiple searches are to be run, separate them with a comma. For example,
searchname1,searchname2,searchname3
Yes
Infile
Name of the file containing input records
Yes
Outfile
Name of the report file to generate
Yes
OutputView
Name of the output view to use.
-rRulebase
Name of the Rulebase
Yes
-pSystem
Name of the System
Yes
-hHost:Port
Name of the host and port number (may be Search or Connection server).
Yes
-iInputViewName
Nominates the view that describes the input records. If not specified, the IDT layout is assumed.
-nx[:y[:z]]
Use x search threads with an input queue. of y records and an output queue of z records per thread.
-wWorkDir
Work Directory
Yes
-t
Change report format to not use tabs.
-bTempfile
To specify a temporary file for relperf to use. By default relperf will use
’relperf.out’
in the Work Directory.
-s
Create a second report for each search ordered by match tolerances. An example of which can be seen in the Example reports section below.
-a
Create an alternate style report with a histogram of accepted count. An example of which can be seen in the Example reports section below.
-c
Creates a default statistical view for use during the relperf run. This view will contain the following fields:
ksl-total-count ksl-accepted-count ksl-rejected-count ksl-undecided-count idx-io idt-io
An Output view does not need to be specified when using this option. However if an output view is specified a view will be created for the run that will consist of all the fields in the specified output view plus any of the default statistical fields not already present.
-dDatabase
Name of the Database. Must be specified when using the -c option.
-eRulebaseHost
Name of the Rulebase Host. Must be specified when using the -c option.

Example reports

Here is an example of a
relperf
report created using a simple output view and a Search-Definition named
search-namev2
. It shows that as the search width increases from Narrow to Typical to Exhaustive, the number of candidates selected also increases. For a given set of candidates (with the same search width), the number of accepted matches increases as the match tolerance becomes looser.
Search Match Candidates Accepted Widths Tolerances --------------------- --------------------------- Average Std Dev Average Std Dev % of Cand search-namev2 Narrow Conservative 1.05 0.94 0.82 0.61 78.10 Narrow Typical 1.05 0.94 0.82 0.61 78.10 Narrow Loose 1.05 0.94 1.01 0.89 96.19 Typical Conservative 2.47 3.92 1.22 0.54 49.39 Typical Typical 2.47 3.92 1.26 0.58 51.01 Typical Loose 2.47 3.92 2.25 3.63 91.09 Exhaustive Conservative 4.59 6.38 1.22 0.54 26.58 Exhaustive Typical 4.59 6.38 1.37 0.72 29.85 Exhaustive Loose 4.59 6.38 2.89 3.69 62.96
This is the output view definition used for this report:
VIEW-DEFINITION *============== NAME=relx98stat FIELD=Name, C, 8 FIELD=ksl-total-count,R, 4 FIELD=ksl-accepted-count,R, 4
This is an example of a report created using the
"-s"
switch:
Search Match Candidates Accepted Widths Tolerances --------------- ------------------------------ Average Std Dev Average Std Dev % of Cand search-namev2 Narrow Conservative 1.05 0.94 0.82 0.61 78.10 Narrow Typical 1.05 0.94 0.82 0.61 78.10 Narrow Loose 1.05 0.94 1.01 0.89 96.19 Typical Conservative 2.47 3.92 1.22 0.54 49.39 Typical Typical 2.47 3.92 1.26 0.58 51.01 Typical Loose 2.47 3.92 2.25 3.63 91.09 Exhaustive Conservative 4.59 6.38 1.22 0.54 26.58 Exhaustive Typical 4.59 6.38 1.37 0.72 29.85 Exhaustive Loose 4.59 6.38 2.89 3.69 62.96 Extreme Conservative 4.59 6.38 1.22 0.54 26.58 Extreme Typical 4.59 6.38 1.37 0.72 29.85 Extreme Loose 4.59 6.38 2.89 3.69 62.96 Narrow Conservative 1.05 0.94 0.82 0.61 78.10 Typical Conservative 2.47 3.92 1.22 0.54 49.39 Exhaustive Conservative 4.59 6.38 1.22 0.54 26.58 Extreme Conservative 4.59 6.38 1.22 0.54 26.58 Narrow Typical 1.05 0.94 0.82 0.61 78.10 Typical Typical 2.47 3.92 1.26 0.58 51.01 Exhaustive Typical 4.59 6.38 1.37 0.72 29.85 Extreme Typical 4.59 6.38 1.37 0.72 29.85 Narrow Loose 1.05 0.94 1.01 0.89 96.19 Typical Loose 2.47 3.92 2.25 3.63 91.09 Exhaustive Loose 4.59 6.38 2.89 3.69 62.96 Extreme Loose 4.59 6.38 2.89 3.69 62.96
This is an example showing the additional columns in a report created using the
"-a"
switch:
A report generated with the
"-a"
switch does not output columns representing standard deviations.
Number of Accepted within Range ------------------------------------------------------------------------ 0 1 2 - 10 11 - 100 101-1000 1001-10000 > 10000 26 69 5 0 0 0 0 26 69 5 0 0 0 0 26 58 16 0 0 0 0 0 84 16 0 0 0 0 0 81 19 0 0 0 0 0 65 33 2 0 0 0 0 84 16 0 0 0 0 0 74 26 0 0 0 0 0 45 53 2 0 0 0

Performance Tips

  • The search performance is directly proportional to the number of candidates selected.
  • Use the results to assess how many additional matches will be found by using a wider search and/or looser match strategy. Is it worth processing twice as many candidates when only a few more records were accepted? The answer depends on the search problem, but at least you can assess the cost and the benefit.
  • Use the average number of accepted matches (plus n times the standard deviation) to set the
    SORT=Memory()
    parameter to ensure that all sorts are performed in-core. Setting
    n=3
    ensures that 99% of sets will fit into memory.
  • Use the average number of candidates (plus
    n
    times the standard deviation) to ensure that the
    Candidate-Set-Size-Limit
    is large enough to record candidates that have already been processed.

0 COMMENTS

We’d like to hear from you!