Relevancy Tuning

Filter Expressions

You can use filter expressions to filter or boost results when making a search request, enabling you to adjust the ranking in real time.

As an example, if you are looking to filter out specific sections of a website, you can use filter to remove a specific section (e.g. you can filter out a blog section by using dir1!='blog'). You can also test the filter expressions in our Preview section as well, refer to this documentation for more details.

If you are using our Website Search Integration, see more details down below.

Note: If you used a generated code from the Integrate section of your Console, you are also using Website Search Integration.

Usage

Filters are used to limit the results that are returned with a search. In a search interface filters are commonly seen as tabs, checkboxes, or sliders on the side.

To make filtering easier, our crawler extracts common fields when it crawls web pages (such as the first and second directories of URLs).

Aside from commonly used fields like title, description and og:image our crawler also extracts other fields which can be useful for filtering. Here are some examples that assume that the page URL is https://www.sajari.com/blog/year-in-review:

  • url The full page URL: https://www.sajari.com/blog/year-in-review
  • dir1 The first directory of the page URL: blog
  • dir2 The second directory of the page URL: year-in-review
  • domain The domain of the page URL: www.sajari.com
  • lang The language of the page, extracted from the <html> element (if present).

Using operators

When filtering a field, there are a few operators that can be used. Note, all values must be enclosed in single quotation marks, i.e. "field boost must be greater than 10" is written as boost>'10'.

OperatorDescriptionExample
Equal To (=)Field is equal to a value (numeric or string)dir1='blog'
Not Equal To (!=)Field is not equal to a value (numeric or string)dir1!='blog'
Greater Than (>)Field is greater than a numeric valueboost>'10'
Greater Than Or Equal To (>=)Field is greater than or equal to a numeric valueboost>='10'
Less Than (<)Field is less than a given numeric valueboost<'50'
Less Than Or Equal To (<=)Field is less than or equal to a given numeric valueboost<'50'
Begins With (^)Field begins with a stringdir1^'bl'
Ends With ($)Field ends with a stringdir1$'og'
Contains (~)Field contains a stringdir1~'blog'
Does Not Contain (!~)Field does not contain a stringdir1!~'blog'

Filtering arrays

The Contains (~) and Does Not Contain (!~) operators can be used to filter values in an array. The following example shows a filter that returns all records with the colour red stored stored in a color array field.

FieldArrayValuesExample
colorYesred, blue, whitecolor ~ ['red']

Combining expressions

It's also possible to build more complex filters by combining field filter expressions with AND/OR operators, and brackets.

OperatorDescriptionExample
ANDBoth expressions must matchdir1='blog' AND domain='www.sajari.com'
OROne expression must matchdir1='blog' OR domain='blog.sajari.com'

For example, to match pages with language set to en on www.sajari.com or any page within the en.sajari.com domain:

(domain='www.sajari.com' AND lang='en') OR domain='en.sajari.com'

Filter functions

Some filters are difficult to express in boolean logic. For these there are filter functions that are utilised to create filters for you. They can also be part of larger boolean expressions.

Checking for existance or non-existance

IS_NULL(field)

Returns TRUE if field is NULL

IS_NOT_NULL(field)

Returns TRUE if field is NOT NULL

Filtering based on geo distance

GEO_INSIDE(latitude, longitude, lat_var, lng_var, radius)

Returns TRUE if the input geopoint lat_var, lng_var is within the haversine radius (in kilometres) of the latitude, longitude geopoint on the record.

Note: there is also a geo_boost step that can be used to boost results based on their geo distance as opposed to filtering as per above.

Time based filtering

SINCE_NOW(field, duration)

Returns TRUE if the timestamp field is equal to or less than the defined duration.

A duration string is a signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns" (nanoseconds), "us" or "µs" (microsecond), "ms" (millisecond), "s" (second), "m" (minute), "h" (hour). For example, to filter records within 1 day of their timestamp field, use SINCE_NOW(field, '24h').

Handling sub variants

ARRAY_MATCH(expression)

Returns TRUE for records where repeated fields have an offset matching the expression.

For example, if a record had 3 variants of a product with varying price, color and size. These variants could be stored in three array based fields and the ARRAY_MATCH() filter function could be used to evaluate each offset in these fields collectively as if they were singular. To illustrate this, if we had the following product indexed:

FieldArrayValues
titleNoAir jordan shoes
colorYesred, red, white
sizeYes13, 14, 13
priceYes122.00, 122.00, 130.00

If we were to search using the filter function ARRAY_MATCH(size = 14 AND color = 'white'), this would look at each offset in the size and color arrays respectively and evaluate the values like they were singular. In this case each of the offsets does not match the filter and FALSE would be returned for this function when evaluating this record.

This filter function is a powerful way to handle variants. Some common examples are price variations (i.e. volume based price breaks) and automotive parts (parts can match many makes and models).