Design Guide

10.5 HotFix 2
- 10.5 HotFix 1
- 10.5
- 10.2 HotFix 1
- 10.2
- 10.1
- 10.0 HotFix 1
- 10.0

Back Next

Reducing Candidate Set Size

Flattening

This feature is used to reduce the size of the candidate set and the size of the IDX. Refer to the Flattening IDTs section in this guide.

Partitions

This feature is used to reduce the size of the candidate set.

For very large files, the key generated from the

KEY-FIELD

may have a high selectivity due to the sheer volume of data present on the file. Therefore searching for candidates using the key will create very large candidate sets.

If the nature of the data is well understood, it may be possible to qualify the key with additional data from the record so that the "qualified key" becomes more selective.

The

PARTITION

option instructs IIR to build a concatenated key from the Key-Logic

FIELD

and up to five fields/sub-fields taken from the IDT record. The partition information forms a high-order qualifier for the key (it is prefixed to the key).

For example, an application may wish to search all names in a telephone directory. If we are willing to only examine candidates from a particular region, we could partition the data using a post-code or some other information that can group the candidates into regions. Performance is improved by reducing the size of candidate sets. The disadvantage is that candidates will only be selected from within regions; not outside. If this makes sense from the perspective of the "business problem" being solved then partitioning can be used.

SQL Filters

SQL Filters may be used to reduce the size of the candidate set. They are evaluated by the DBMS as candidates are selected. By removing candidates within the DBMS engine, we reduce network overheads and matching costs within the Search Server.

An SQL Filter is a DBMS-specific SQL expression that is appended to the query that selects candidate rows from the IDX/IDT. It can only be used to remove candidates.

For example, a particular Search-Definition may perform a fuzzy search using a Name. Without filters, all candidates are returned to the Search Server to be matched against the search record. If a filter was specified requesting

STATE = ’NY’

, only candidates living in New York would be returned for matching. Although the same effect can be achieved using a partition, SQL filters provide more flexibility.

The SQL may be provided statically using the

FILTER=

parameter in the

Search-Definition

or at run-time with the

ids_search_filter

API. The Search-Definition options

Filter-Append

and

Filter-Replace

are used to control whether, and how dynamic filters will operate.

The DBMS needs access to native columns in the IDT in order to evaluate the expression in the filter. Therefore the columns mentioned in a filter must only refer to the columns available in the IDT. Note that source column names are prefixed with

"IDS_"

in the IDT.

A search that uses a filter will access both the IDX and IDT, performing significantly more database I/O than a standard search that only accesses the IDX. Therefore a filter-based search may be less efficient (slower), especially if it selects a large percentage of the IDX (which may result in full-index and/or full-table-scans of the IDX and IDT respectively). There is a balance point, beyond which the extra I/O is more expensive than performing a standard search and scoring all candidates. Customers should evaluate where this point lies based on their particular data and search requirements.

Lite Index cannot support a search filter.

Performance Tips

If an IDX will always be used in conjunction with a filter, and will never be used without one, specify

--Full-Key-Data

when creating the IDX. This will reduce its size and improve database cache efficiency. If full-index scans and the IDX are performed (due to wide searches), consider creating a dedicated IDX specifying

--Full-Key-Data

to reduce its size.

If the filter clause is of the form "column = value", consider creating a partition instead of using an SQL filter.

Consider creating a database index on the column(s) specified in the filter clause. Remember to collect optimizer statistics for the new index, and consider creating a frequency histogram (on Oracle), if the index values are not evenly distributed.

Static Filters

A static filter is defined in a Search-Definition with the

FILTER=

parameter. It can be overridden at run-time only if the Search Option

Filter-Replace

has been specified.

Dynamic Filters

Dynamic filters are provided with the

ids_search_filter

API. If the

Search-Definition

specifies the

Filter-Replace

option, the dynamic filter will replace the filter specified in the Search- Definition (if one was defined).

If the

Search-Definition

specifies the

Filter-Append

option, the dynamic filter will be appended to the filter specified in the Search-Definition (if one was defined).

Skeletal Filters

Skeletal filters are a combination of static and dynamic filters. If the SQL defined in the

FILTER=

parameter contains any substitution variables, a dynamic filter is used to provide values to be substituted into the SQL at run-time. Substitution variables are of the form

, where

= 1, 2,. . .

For example:

FILTER= "IDS_STATE = ’$1’"

defines a skeletal filter containing one substitution variable (

The API function

ids_search_filter

must be called before the search is started in order to provide values to be substituted into the skeletal SQL. Values remain in effect until the caller provides new values or switches to a new search.

The API call must provide a delimited string that contains enough values to satisfy the skeletal SQL’s requirements. The first character in the string defines the delimiter character used to separate values in the string.

For example:

|NY|

defines

as the delimiter and

as the first value. Note that the string must be terminated with the delimiter character. The effective filter (after substitution) is changed to,

IDS_STATE = ’NY’

Multi-Search

Dynamic and skeletal filters defined with

ids_search_filter

have an order of precedence when used in a Multi-Search.

IIR will use a filter defined for a sub-search (if present) before using a filter defined for the Multi-Search.

Using PL/SQL and User-Defined Functions

As noted earlier, a search that uses an SQL Filter will read both the IDX and IDT. The latter is necessary to access values of columns that appear in the WHERE predicate. Normally, the redundant data held in the IDX are stored in a proprietary compressed format and are inaccessible to the DBMS engine. However, if the redundant data were uncompressed, improved performance would result since IDX rows alone are sufficient to evaluate the predicate, removing the need to read IDT rows.

An alternative SQL Filter mechanism is available for just such a case. It requires that:

redundant data stored in the IDX are uncompressed. This is achieved by specifying the

--Full-Key-Data

option to remove the redundant (compressed) IDT data from the IDX row, and by specifying the

Key-Data=

parameter to store up to 5 uncompressed IDT columns in the IDX row (immediately following the 8-byte fuzzy key).

the creation of a PL/SQL or User-Defined Function that, given an IDX row, can examine the Key-Data to decide whether the particular row should be returned as a candidate for further matching,

an SQL Filter clause that calls the function (passing the IDX row) and interprets the result appropriately.

For example:

The following IDX Definition disables Full-Key-Data (which stores all IDT columns in the IDX) and specifies a list of IDT columns to be appended after the 8-byte fuzzy key (Name and State). Up to 5 columns (or part thereof) may be specified with the Key-Data parameter, as long as the total key length does not exceed the maximum key size permitted by the particular DBMS (~ 250 bytes).


idx-definition
*==================
NAME=   			namev2
ID= 				u5
IDT-NAME=	 	IDT305
KEY-LOGIC= 		SSA, System(default), Population(test),
					Controls ("FIELD=Person_Name KEY_LEVEL=Standard"),
					Field("name")
KEY-DATA= 		Name,State
OPTIONS= 		--Full-Key-Data

The full syntax for specifying Key-Data is:

KEY-DATA = field[,length,offset], ...

where

field

is the field name,

length

is an optional number of bytes to include (the entire field is included by default), and

offset

is an optional starting offset (base 0).

The Search-Definition includes a filter clause that calls a user-written function. The function call must include a field named

SSAKEY

to provide the function with access to the Key-Data. The response code returned by the function determines whether the row is kept or discarded. For example:


search-definition
*====================
NAME= 				search-filter-ny
IDX=					namev2
KEY-LOGIC=			SSA, System(default), Population(test),
						Controls ("FIELD=Person_Name SEARCH_LEVEL=typical"),
						Field("name")
KEY-SCORE-LOGIC=	SSA, System(default), Population(test),
						Controls("PURPOSE=Person_Name MATCH_LEVEL=Typical"),
						Matching-Fields("Name:person_name")
FILTER= 				"#ids_sql_filter.demo(SSAKEY)=1"
OPTIONS= 			Filter-Replace

Note that the filter statement:

begins with ’#’. This character informs the search engine to use the new semantics, otherwisethe original filter mechanism is used.

passes the entire IDX record (

SSAKEY

) to the function named

ids_sql_filter.demo

. The function examines the Key-Data and returns a decision to either keep or discard the row. The response-code is arbitrary. In this example, a value of 1 will keep the candidate row.

The corresponding function, written in PL/SQL, is as follows.


CREATE OR REPLACE PACKAGE BODY ids_sql_filter AS
		 FUNCTION demo (
					 ssakey 	IN RAW)
					 RETURN 	NUMBER
		 IS
					 rec 			VARCHAR (255);
					 state 		VARCHAR (2);
		 BEGIN
					 rec := utl_raw.cast_to_varchar2 (ssakey);
					 state := SUBSTR(rec,33,2);
					 IF (state = ’NY’) THEN
								RETURN 1;
					 END IF;
					 RETURN 0;
		END demo;
END ids_sql_filter;

The sample function performs the following actions:

converts the input parameter

ssakey

from raw to char, which allows the Key-Data to be extracted using a substring function.

extracts the value of State from the Key-Data. In this case, the Key-Data contains an 8-byte fuzzy-key, followed by a Name (for a length of 24 bytes), followed by State (2 bytes). Thus, the State field begins at offset 32 (base 0). The Oracle substring function assumes offsets start from 1. Therefore the substring command refers to position 33 for a length of 2 bytes to exact the value of State.

The extracted State value is compared to a value of ’NY’. When the candidate value is identical, the function returns 1, meaning that the candidate row should be kept. Otherwise, it returns 0, which effectively discards the candidate row.

Note that the value of

does not need to be hard-coded. It could have been passed in as the second parameter to the function by hardcoding it in the filter statement, or even set dynamically by using a substitution variable. For example,

FILTER= "#ids_sql_filter.demo(SSAKEY,$1)=1"

When using this alternative SQL Filter method, only the Key-Data fields will be available for matching, display, and return as part of the search results.

Rename Saved Search

Table of Contents

Design Guide

Design Guide

Reducing Candidate Set Size

Reducing Candidate Set Size

Flattening

Partitions

SQL Filters

Performance Tips

Static Filters

Dynamic Filters

Skeletal Filters

Multi-Search

Using PL/SQL and User-Defined Functions