Indexing Data

Defining a Schema

Overview

A schema defines the structure of your data in the search index. It defines the data types (string, integer, etc) and constraints for the data you want to index. This ensures the data stored in an index is interpreted correctly.

Each record of a collection must adhere to the defined schema for it to be added to the index.

From the schema section in the Console, you can:

  • View the schema fields
  • Create a schema field

When you create a Site Search collection, a pre-built website schema template is used by default. You can view the default schema from the schema section in the console.

You can create additional schema fields if you want to index additional data items.

For example, your website may contain webpages with specific metadata or content (e.g. "Author Name") that you want to display in the search interface. To index the additional metadata or content in a Site Search collection, you need to:

  1. Add a schema field (e.g. author_name) and select the relevant schema field type.
  2. Add custom tags to your webpage or content.
  3. Re-crawl the webpage (https://app.sajari.com/collection/domains).

Schema for an E-commerce Store or App

When setting up a collection for an e-commerce store or an app, you need to define a schema from scratch. Start by creating schema fields based on your data by creating a new collection from the console. Add a sample record in or upload your data that you want to index, and click on "Generate Schema".

Once done, the next step will allow you to verify the schema and make adjustments if required. You must choose at least one unique field to proceed to the next step. For most cases, the unique field is usually an id, sku, or URL.

Understanding Schema fields

Field properties

Each schema field has the following properties.

PropertyDescription
NameThe name of the field. This uniquely defines the field
TypeThe type of data stored in the field, see table below for description of types
ListThe field can contain a single or multiple values of the specified type. Also referred as an "array" or "repeated field"
ModeThe mode defines levels of strictness and can be either NULLABLE, REQUIRED, OR UNIQUE

Mode:

  • NULLABLE: the field can be null
  • REQUIRED: the field must be present
  • UNIQUE: the field must be present and unique across all records

Field Types

The following field types can be used on the records in your Collection.

NameDescriptionExample
BOOLEANTwo possible values: true or falsetrue
INTEGERWhole numbers42
FLOATNumbers with a fractional value - 32-bit float12.34
DOUBLENumbers with a high precision fractional value, e.g. lat/lng12.3456789
STRINGString of text"I'm the walrus"
TIMESTAMPDate/time value1234567890 (UNIX timestamp) or 2009-02-13T23:31:30+00:00 (RFC3339)

Changing Schema fields

If your collection contains records and is not empty, you can:

All other properties of a schema field cannot be changed if your collection already contains records. There are possible workarounds:

  • Add a new field with a different name with the desired properties
  • Remove all the records from the collection, make the changes to the schema, and then add the records again.

Alternatively, you can also start again by creating a new collection.

Change a field to be indexed

To change a field to be indexed, you need to update the record pipelines.

Lets assume that you have a schema field name 'topics', and you didn't make it 'indexed' when you initially added it. To make the 'topics' field an indexed field, do the following:

  1. Go to the Pipelines section
  2. Choose the latest 'Record' pipeline (e.g. "'RECORD' website").
  3. Add the field 'topics' in both 'create-indexes' and 'allow-fields' steps in the values. Save the pipeline and make it the default version.
  4. You will need to do a full re-index of your collection to ensure the data on every record is indexed.

You will also need to update your query pipeline based on your desired relevance needs.