How Sajari works

Spelling

Almost half of the users who can’t find what they need on their first search will abandon a site immediately.

Sajari’s spelling correction is designed to help users find the results they are looking for from the first search, even if they have not spelled their query terms correctly.

Spelling correction is available on all plans. In this section, we will explain how spelling correction works, and how you can customize it.

How does spelling work?

Sajari builds a custom spelling model based on the records in your collection. This ensures that spelling correction works for custom terms, like brand names, that aren’t in a standard dictionary.

When a text query is submitted your custom spelling model is used to provide a list of word and phrase suggestions that could be potential alternatives to the query.

Suggestions are determined on a combination of the following:

Edit distance

Edit distance quantifies how dissimilar words are to one another by calculating the number of steps to transform one word into another. Spelling corrections can be detected for words that are up to two edit distances.

Transformations that have one edit distance from the incorrect word to the correct word:

  • bke -> bike - missing letter in a word
  • boke -> bike - letter substitution
  • Biike -> bike - has an extra letter
  • Bkie -> bike - has a letter swap

Popularity

Popular words and phrases can be good candidates for spelling suggestions if they appear frequently in your collection.

Example: You have an ecommerce store that sells bicycles, and a customer enters the query clock. Your store sells a lot of bike locks and no clocks, so lock might be a good candidate for a spelling suggestion.

Phrase correction

Sajari can detect and correct phrases that are present in your collection or in user queries. Bigram (two words) and trigram (three words) detection can correct and detect misspelled words based on the words around them.

Example:

  • Now York City -> New York City

Phrase splitting and combining also ensure queries entered as one term are split into two words if the two words spelling is more common. This also works in reverse to combine two separate words into one single term.

Example:

  • handcream -> hand cream
  • tread mill -> treadmill

After taking into account edit distance, popularity and phrases Sajari then calculates a score for each spelling suggestion. The score indicates how confident the system is that a spelling suggestion is a good alternative for the input query.

A search is then run taking the user’s original query and the suggestions from spelling correction. This process is handled entirely by the spelling system with no need to intervene or pass additional queries.

For example, a user may be searching for cheap headphones, but through a combination of misspellings and typos their final query may be “chep head phones”. Rather than make one guess at a correction, spelling correction will search for a variety of different queries depending on the edit distance, term popularity, and phrase correction. The final weighted query that is sent to the engine for processing may look like:

chep head phones
cheap headphones
cheap head phones

Spelling correction takes a probabilistic approach to spelling as an incorrect spelling correction can be harmful to a business. In this instance the customer’s product data may contain both the correct and incorrect spelling of “headphones” or potentially there is a new brand called “chep” on the site with only a few products. Using a weighted, probabilistic approach means that there is a lower chance of returning zero results for a query and frustrating the user.

Training the spelling system

For website collections, spelling is configured automatically and requires no additional set up. If you would like to include custom fields in your spelling model follow the custom fields guide and then refer to the customization information below.