Pipelines

Configuring Spelling

For website collections, spelling is configured automatically and requires no additional set up. Learn more about how spelling correction works in the user guide.

For e-commerce and app collections an initial spelling configuration is created during the console onboarding process. You can further customize your spelling model through your record pipeline.

Training the spelling system

Every time a record is indexed your spelling model can be trained from specified fields in that record.

Add the train-spelling step to your record pipeline to enable training for your spelling model.

Specify the fields in your data that you wish to use to train the spelling model to the consts:fields parameter. As records are added to your collection data from these fields is used to build a custom spelling dictionary for your collection.

A label is assigned to each field used to train spelling specified in the const:fields parameter. You can assign the fields in the example below:

id: train-spelling
  params:
    fields:
      const: name:name,brand:brand,categories:categories,description:description

The default language used to train spelling is English. The language used to train your spelling model determines how information is processed as your spelling model is built. Processing Japanese characters requires different processing logic to Roman based languages like English, French, and Italian. You can change the language by including the input:lang parameter.

Disabling language processing for specific fields

You may not want any language processing to be performed on some fields. Fields such as model numbers and part numbers are usually best left unprocessed. The following example shows how you can use the set-param-values step to set the processing language to ‘zxx’ (no language processing) for specified fields.

- id: train-spelling
  title: set lang to zxx (no language) to train spelling for model number etc. 
  params:
    lang:
      bind: nolang
      defaultValue: zxx
    fields:
      const: modelNumber:modelNumber

Using multiple languages and spelling models

You can train spelling for multiple languages and have a separate spelling model for each language. The following example shows how you can train spelling for English and “zxx” for model numbers together.

- id: train-spelling
  title: train spelling with english (default)
  params:  
    fields:
      const: name:name,brand:brand,categories:categories,description:description
- id: train-spelling
  title: set lang to zxx (no language) to train spelling for model number etc. 
  params:
    lang:
      bind: nolang
      defaultValue: zxx
    fields:
      const: modelNumber:modelNumber

Enabling spell correction

Add the index-spelling step in your query pipeline to enable spelling correction during a search query. The following example shows how to pass the query q for spelling correction using a default configuration:

- id: index-spelling
  params:
    text:
      bind: q

It is highly recommended to set the const:phraseLabelWeights to tell the spelling system which labels correspond to phrases. This will increase the accuracy of spelling correction when correcting multiple word queries. Typically the labels set in the train-autocomplete-v2 step of the record pipeline and the train-autocomplete step in the query pipeline would be used to configure const:phraseLabelWeights.

The following example shows how to set the const:phraseLabelWeights using Live query training (denoted by the label query) Fields from the record (‘brand’ and ‘category’)

Note that all labels are weighed equally, to a value of “1.0”.

- id: index-spelling
  params:
    text:
      bind: q
    phraseLabelWeights:
      const: query:1.0,brand:1.0,category:1.0

Weights are used to influence the score assigned for a phrase suggestion. Different labels can have different weights. The example below configures the weights of live query training to be twice as important as the record fields.

- id: index-spelling
  params:
    text:
      bind: q
    phraseLabelWeights:
      const: query:1.0,brand:0.5,category:0.5