Table of Contents

Search

  1. Preface
  2. Introduction
  3. Defining a System
  4. Flattening IDTs
  5. Link Tables
  6. Loading a System
  7. Static Clustering
  8. Simple Search
  9. Search Performance
  10. Miscellaneous Issues
  11. Limitations
  12. Error Messages

Reducing Candidate Set Size

Reducing Candidate Set Size

Flattening

This feature is used to reduce the size of the candidate set and the size of the IDX. Refer to the Flattening IDTs section in this guide.

Partitions

This feature is used to reduce the size of the candidate set.
For very large files, the key generated from the
KEY-FIELD
may have a high selectivity due to the sheer volume of data present on the file. Therefore searching for candidates using the key will create very large candidate sets.
If the nature of the data is well understood, it may be possible to qualify the key with additional data from the record so that the "qualified key" becomes more selective.
The
PARTITION
option instructs IIR to build a concatenated key from the Key-Logic
FIELD
and up to five fields/sub-fields taken from the IDT record. The partition information forms a high-order qualifier for the key (it is prefixed to the key).
For example, an application may wish to search all names in a telephone directory. If we are willing to only examine candidates from a particular region, we could partition the data using a post-code or some other information that can group the candidates into regions. Performance is improved by reducing the size of candidate sets. The disadvantage is that candidates will only be selected from within regions; not outside. If this makes sense from the perspective of the "business problem" being solved then partitioning can be used.

SQL Filters

SQL Filters may be used to reduce the size of the candidate set. They are evaluated by the DBMS as candidates are selected. By removing candidates within the DBMS engine, we reduce network overheads and matching costs within the Search Server.
An SQL Filter is a DBMS-specific SQL expression that is appended to the query that selects candidate rows from the IDX/IDT. It can only be used to remove candidates.
For example, a particular Search-Definition may perform a fuzzy search using a Name. Without filters, all candidates are returned to the Search Server to be matched against the search record. If a filter was specified requesting
STATE = ’NY’
, only candidates living in New York would be returned for matching. Although the same effect can be achieved using a partition, SQL filters provide more flexibility.
The SQL may be provided statically using the
FILTER=
parameter in the
Search-Definition
or at run-time with the
ids_search_filter
API. The Search-Definition options
Filter-Append
and
Filter-Replace
are used to control whether, and how dynamic filters will operate.
The DBMS needs access to native columns in the IDT in order to evaluate the expression in the filter. Therefore the columns mentioned in a filter must only refer to the columns available in the IDT. Note that source column names are prefixed with
"IDS_"
in the IDT.
A search that uses a filter will access both the IDX and IDT, performing significantly more database I/O than a standard search that only accesses the IDX. Therefore a filter-based search may be less efficient (slower), especially if it selects a large percentage of the IDX (which may result in full-index and/or full-table-scans of the IDX and IDT respectively). There is a balance point, beyond which the extra I/O is more expensive than performing a standard search and scoring all candidates. Customers should evaluate where this point lies based on their particular data and search requirements.
Lite Index cannot support a search filter.

Performance Tips

If an IDX will always be used in conjunction with a filter, and will never be used without one, specify
--Full-Key-Data
when creating the IDX. This will reduce its size and improve database cache efficiency. If full-index scans and the IDX are performed (due to wide searches), consider creating a dedicated IDX specifying
--Full-Key-Data
to reduce its size.
If the filter clause is of the form "column = value", consider creating a partition instead of using an SQL filter.
Consider creating a database index on the column(s) specified in the filter clause. Remember to collect optimizer statistics for the new index, and consider creating a frequency histogram (on Oracle), if the index values are not evenly distributed.

Static Filters

A static filter is defined in a Search-Definition with the
FILTER=
parameter. It can be overridden at run-time only if the Search Option
Filter-Replace
has been specified.

Dynamic Filters

Dynamic filters are provided with the
ids_search_filter
API. If the
Search-Definition
specifies the
Filter-Replace
option, the dynamic filter will replace the filter specified in the Search- Definition (if one was defined).
If the
Search-Definition
specifies the
Filter-Append
option, the dynamic filter will be appended to the filter specified in the Search-Definition (if one was defined).

Skeletal Filters

Skeletal filters are a combination of static and dynamic filters. If the SQL defined in the
FILTER=
parameter contains any substitution variables, a dynamic filter is used to provide values to be substituted into the SQL at run-time. Substitution variables are of the form
$n
, where
n
= 1, 2,. . .
For example:
FILTER= "IDS_STATE = ’$1’"
defines a skeletal filter containing one substitution variable (
$1
).
The API function
ids_search_filter
must be called before the search is started in order to provide values to be substituted into the skeletal SQL. Values remain in effect until the caller provides new values or switches to a new search.
The API call must provide a delimited string that contains enough values to satisfy the skeletal SQL’s requirements. The first character in the string defines the delimiter character used to separate values in the string.
For example:
|NY|
defines
|
as the delimiter and
NY
as the first value. Note that the string must be terminated with the delimiter character. The effective filter (after substitution) is changed to,
IDS_STATE = ’NY’

Multi-Search

Dynamic and skeletal filters defined with
ids_search_filter
have an order of precedence when used in a Multi-Search.
IIR will use a filter defined for a sub-search (if present) before using a filter defined for the Multi-Search.

Using PL/SQL and User-Defined Functions

As noted earlier, a search that uses an SQL Filter will read both the IDX and IDT. The latter is necessary to access values of columns that appear in the WHERE predicate. Normally, the redundant data held in the IDX are stored in a proprietary compressed format and are inaccessible to the DBMS engine. However, if the redundant data were uncompressed, improved performance would result since IDX rows alone are sufficient to evaluate the predicate, removing the need to read IDT rows.
An alternative SQL Filter mechanism is available for just such a case. It requires that:
  • redundant data stored in the IDX are uncompressed. This is achieved by specifying the
    --Full-Key-Data
    option to remove the redundant (compressed) IDT data from the IDX row, and by specifying the
    Key-Data=
    parameter to store up to 5 uncompressed IDT columns in the IDX row (immediately following the 8-byte fuzzy key).
  • the creation of a PL/SQL or User-Defined Function that, given an IDX row, can examine the Key-Data to decide whether the particular row should be returned as a candidate for further matching,
  • an SQL Filter clause that calls the function (passing the IDX row) and interprets the result appropriately.
For example:
The following IDX Definition disables Full-Key-Data (which stores all IDT columns in the IDX) and specifies a list of IDT columns to be appended after the 8-byte fuzzy key (Name and State). Up to 5 columns (or part thereof) may be specified with the Key-Data parameter, as long as the total key length does not exceed the maximum key size permitted by the particular DBMS (~ 250 bytes).
idx-definition *================== NAME= namev2 ID= u5 IDT-NAME= IDT305 KEY-LOGIC= SSA, System(default), Population(test), Controls ("FIELD=Person_Name KEY_LEVEL=Standard"), Field("name") KEY-DATA= Name,State OPTIONS= --Full-Key-Data
The full syntax for specifying Key-Data is:
KEY-DATA = field[,length,offset], ...
where
field
is the field name,
length
is an optional number of bytes to include (the entire field is included by default), and
offset
is an optional starting offset (base 0).
The Search-Definition includes a filter clause that calls a user-written function. The function call must include a field named
SSAKEY
to provide the function with access to the Key-Data. The response code returned by the function determines whether the row is kept or discarded. For example:
search-definition *==================== NAME= search-filter-ny IDX= namev2 KEY-LOGIC= SSA, System(default), Population(test), Controls ("FIELD=Person_Name SEARCH_LEVEL=typical"), Field("name") KEY-SCORE-LOGIC= SSA, System(default), Population(test), Controls("PURPOSE=Person_Name MATCH_LEVEL=Typical"), Matching-Fields("Name:person_name") FILTER= "#ids_sql_filter.demo(SSAKEY)=1" OPTIONS= Filter-Replace
Note that the filter statement:
  • begins with ’#’. This character informs the search engine to use the new semantics, otherwisethe original filter mechanism is used.
  • passes the entire IDX record (
    SSAKEY
    ) to the function named
    ids_sql_filter.demo
    . The function examines the Key-Data and returns a decision to either keep or discard the row. The response-code is arbitrary. In this example, a value of 1 will keep the candidate row.
The corresponding function, written in PL/SQL, is as follows.
CREATE OR REPLACE PACKAGE BODY ids_sql_filter AS FUNCTION demo ( ssakey IN RAW) RETURN NUMBER IS rec VARCHAR (255); state VARCHAR (2); BEGIN rec := utl_raw.cast_to_varchar2 (ssakey); state := SUBSTR(rec,33,2); IF (state = ’NY’) THEN RETURN 1; END IF; RETURN 0; END demo; END ids_sql_filter;
The sample function performs the following actions:
  • converts the input parameter
    ssakey
    from raw to char, which allows the Key-Data to be extracted using a substring function.
  • extracts the value of State from the Key-Data. In this case, the Key-Data contains an 8-byte fuzzy-key, followed by a Name (for a length of 24 bytes), followed by State (2 bytes). Thus, the State field begins at offset 32 (base 0). The Oracle substring function assumes offsets start from 1. Therefore the substring command refers to position 33 for a length of 2 bytes to exact the value of State.
  • The extracted State value is compared to a value of ’NY’. When the candidate value is identical, the function returns 1, meaning that the candidate row should be kept. Otherwise, it returns 0, which effectively discards the candidate row.
Note that the value of
NY
does not need to be hard-coded. It could have been passed in as the second parameter to the function by hardcoding it in the filter statement, or even set dynamically by using a substitution variable. For example,
FILTER= "#ids_sql_filter.demo(SSAKEY,$1)=1"
When using this alternative SQL Filter method, only the Key-Data fields will be available for matching, display, and return as part of the search results.

0 COMMENTS

We’d like to hear from you!