Design Guide

10.5 HotFix 2
- 10.5 HotFix 3
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

relperf

The

relperf

utility is used to generate comparative performance statistics for a specified Search using a range of search strategies (search widths and match tolerances). By comparing the number of candidates selected with the number of accepted matches,

relperf

helps to determine the most appropriate strategy for a particular search problem.

Given a representative set of search data,

relperf

runs multiple search processes using all available search widths and match tolerances and then collates and summarizes the results.

The user specifies the search statistics to be reported by defining an output view that includes special statistical fields generated by the Search Server. Refer to the Output Views section in this guide for a list of statistical fields that are available.

Although statistical fields may be written in any numeric format to an output view, the report file will only summarize the statistics for fields that have a field type of ’R’. Therefore the statistical fields to be summarized in the report must have a format of R. Specify a length for the field that is large enough to handle the number of rows processed.

R,10

is adequate for most situations.

By default the report produced is tab delimited, which is suitable for importing into a spreadsheet. Use the

-t

switch to generate a report file that uses spaces instead of tabs.

Starting from the Command Line

relperf

can be started from the command line as follows:

For Win32, type the following:

%SSABIN%\relperf Search Infile Outfile OutputView-rRulebase-pSystem -hHost:Port -wWorkDir[Optional Switches]

For Unix, type the following:

$SSABIN/relperf Search Infile Outfile OutputView-rRulebase -pSystem -hHost:Port[Optional Switches]

The values passed in the command line are described in the following table:

Parameter	Description	Mandatory
Search	Nominates the Search Definition to use. If multiple searches are to be run, separate them with a comma. For example, searchname1,searchname2,searchname3	Yes
Infile	Name of the file containing input records	Yes
Outfile	Name of the report file to generate	Yes
OutputView	Name of the output view to use.
-rRulebase	Name of the Rulebase	Yes
-pSystem	Name of the System	Yes
-hHost:Port	Name of the host and port number (may be Search or Connection server).	Yes
-iInputViewName	Nominates the view that describes the input records. If not specified, the IDT layout is assumed.
-nx[:y[:z]]	Use x search threads with an input queue. of y records and an output queue of z records per thread.
-wWorkDir	Work Directory	Yes
-t	Change report format to not use tabs.
-bTempfile	To specify a temporary file for relperf to use. By default relperf will use ’relperf.out’ in the Work Directory.
-s	Create a second report for each search ordered by match tolerances. An example of which can be seen in the Example reports section below.
-a	Create an alternate style report with a histogram of accepted count. An example of which can be seen in the Example reports section below.
-c	Creates a default statistical view for use during the relperf run. This view will contain the following fields: ksl-total-count ksl-accepted-count ksl-rejected-count ksl-undecided-count idx-io idt-io An Output view does not need to be specified when using this option. However if an output view is specified a view will be created for the run that will consist of all the fields in the specified output view plus any of the default statistical fields not already present.
-dDatabase	Name of the Database. Must be specified when using the -c option.
-eRulebaseHost	Name of the Rulebase Host. Must be specified when using the -c option.

Example reports

Here is an example of a

relperf

report created using a simple output view and a Search-Definition named

search-namev2

. It shows that as the search width increases from Narrow to Typical to Exhaustive, the number of candidates selected also increases. For a given set of candidates (with the same search width), the number of accepted matches increases as the match tolerance becomes looser.


Search				 Match				 				Candidates 				        Accepted
Widths 				Tolerances 			--------------------- ---------------------------
											              	Average 	Std Dev 		  Average 		Std Dev 		% of Cand
search-namev2

Narrow 				Conservative 		1.05		    0.94 			   0.82 		    0.61 				 78.10
Narrow 				Typical 				   1.05 	 	  0.94 			   0.82 		    0.61				  78.10
Narrow 				Loose 			     	1.05 	  	 0.94 			   1.01 		    0.89 			 	96.19
Typical			 Conservative 		2.47 		   3.92 			   1.22 		    0.54 				 49.39
Typical 			Typical 		   	 2.47 		   3.92  		  	1.26		     0.58				  51.01
Typical 			Loose 			      2.47  	   3.92 			   2.25		     3.63  			 91.09
Exhaustive Conservative 		4.59 		   6.38				   1.22		     0.54				  26.58
Exhaustive Typical 				   4.59 		   6.38 			   1.37		     0.72				  29.85
Exhaustive Loose 				     4.59		    6.38				   2.89		     3.69			  	62.96

This is the output view definition used for this report:


VIEW-DEFINITION
*==============
NAME=relx98stat
FIELD=Name, 			 C, 8
FIELD=ksl-total-count,R, 4
FIELD=ksl-accepted-count,R, 4

This is an example of a report created using the

"-s"

switch:


Search 				Match 								Candidates 			   Accepted
Widths 				Tolerances 			---------------  ------------------------------   
                         Average Std Dev  Average  Std Dev    % of Cand

search-namev2

Narrow     Conservative   1.05    0.94     0.82     0.61         78.10
Narrow     Typical        1.05    0.94     0.82     0.61         78.10
Narrow     Loose          1.05    0.94     1.01     0.89         96.19
Typical    Conservative   2.47    3.92     1.22     0.54         49.39
Typical    Typical        2.47    3.92     1.26     0.58         51.01
Typical    Loose          2.47    3.92     2.25     3.63         91.09
Exhaustive Conservative   4.59    6.38     1.22     0.54         26.58
Exhaustive Typical        4.59    6.38     1.37     0.72         29.85
Exhaustive Loose          4.59    6.38     2.89     3.69         62.96
Extreme    Conservative   4.59    6.38     1.22     0.54         26.58
Extreme    Typical        4.59    6.38     1.37     0.72         29.85
Extreme    Loose          4.59    6.38     2.89     3.69         62.96
Narrow     Conservative   1.05    0.94     0.82     0.61         78.10
Typical    Conservative   2.47    3.92     1.22     0.54         49.39
Exhaustive Conservative   4.59    6.38     1.22     0.54         26.58
Extreme    Conservative   4.59    6.38     1.22     0.54         26.58
Narrow     Typical        1.05    0.94     0.82     0.61         78.10
Typical    Typical        2.47    3.92     1.26     0.58         51.01
Exhaustive Typical        4.59    6.38     1.37     0.72         29.85
Extreme    Typical        4.59    6.38     1.37     0.72         29.85
Narrow     Loose          1.05    0.94     1.01     0.89         96.19
Typical    Loose          2.47    3.92     2.25     3.63         91.09
Exhaustive Loose          4.59    6.38     2.89     3.69         62.96
Extreme    Loose          4.59    6.38     2.89     3.69         62.96

This is an example showing the additional columns in a report created using the

"-a"

switch:

A report generated with the

"-a"

switch does not output columns representing standard deviations.


Number of Accepted within Range
------------------------------------------------------------------------
0 				1 			2 - 10 			11 - 100 			101-1000 		1001-10000 				> 10000
26 			69 	      5 					    0 						    0 					       0 								  0
26 			69 			    5 					    0 						    0 					       0 							   0
26			 58 			   16 				     0 						    0					        0 							   0
0 				84 			   16 				     0 						    0					        0							   	0
0 				81 		   	19 				     0 						    0					        0								   0
0 				65 		   	33 			     	2					     	0				         0 							   0
0 				84			    16 				     0 						    0					        0							    0
0				 74 			   26			  	    0					     	0 					       0								   0
0 				45	   	 	53				     	2 					     0 					       0								   0

Performance Tips

The search performance is directly proportional to the number of candidates selected.

Use the results to assess how many additional matches will be found by using a wider search and/or looser match strategy. Is it worth processing twice as many candidates when only a few more records were accepted? The answer depends on the search problem, but at least you can assess the cost and the benefit.

Use the average number of accepted matches (plus n times the standard deviation) to set the

SORT=Memory()

parameter to ensure that all sorts are performed in-core. Setting

n=3

ensures that 99% of sets will fit into memory.

Use the average number of candidates (plus

times the standard deviation) to ensure that the

Candidate-Set-Size-Limit

is large enough to record candidates that have already been processed.

Search Statistics and Tracing

Download Guide

Watch