Defining a Schema
A schema defines the structure of your data in the search index. It defines the data types (string, integer, etc) and constraints for the data you want to index. This ensures the data stored in an index is interpreted correctly.
Each record of a collection must adhere to the defined schema for it to be added to the index.
From the schema section in the Console, you can:
- View the schema fields
- Create a schema field
Schema for Site Search
When you create a Site Search collection, a pre-built website schema template is used by default. You can view the default schema from the schema section in the console.
You can create additional schema fields if you want to index additional data items.
For example, your website may contain webpages with specific metadata or content (e.g. "Author Name") that you want to display in the search interface. To index the additional metadata or content in a Site Search collection, you need to:
- Add a schema field (e.g.
author_name) and select the relevant schema field type.
- Add custom tags to your webpage or content.
- Re-crawl the webpage (https://app.sajari.com/collection/domains).
Schema for an E-commerce Store or App
When setting up a collection for an e-commerce store or an app, you need to define a schema from scratch. Start by creating schema fields based on your data by creating a new collection from the console. Add a sample record in or upload your data that you want to index, and click on "Generate Schema".
Once done, the next step will allow you to verify the schema and make adjustments if required. You must choose at least one unique field to proceed to the next step. For most cases, the unique field is usually an id, sku, or URL.
Understanding Schema fields
Each schema field has the following properties.
|Name||The name of the field. This uniquely defines the field|
|Type||The type of data stored in the field, see table below for description of types|
|List||The field can contain a single or multiple values of the specified type. Also referred as an "array" or "repeated field"|
|Mode||The mode defines levels of strictness and can be either NULLABLE, REQUIRED, OR UNIQUE|
- NULLABLE: the field can be null
- REQUIRED: the field must be present
- UNIQUE: the field must be present and unique across all records
The following field types can be used on the records in your Collection.
|BOOLEAN||Two possible values: true or false||true|
|FLOAT||Numbers with a fractional value - 32-bit float||12.34|
|DOUBLE||Numbers with a high precision fractional value, e.g. lat/lng||12.3456789|
|STRING||String of text||"I'm the walrus"|
|TIMESTAMP||Date/time value||1234567890 (UNIX timestamp) or 2009-02-13T23:31:30+00:00 (RFC3339)|
Changing Schema fields
If your collection contains records and is not empty, you can:
- change 'description' of a schema field
- change 'mode' to a less restrictive value ('UNIQUE' -> 'REQUIRED' -> 'NULLABLE')
- change a field to be 'indexed'
All other properties of a schema field cannot be changed if your collection already contains records. There are possible workarounds:
- Add a new field with a different name with the desired properties
- Remove all the records from the collection, make the changes to the schema, and then add the records again.
Alternatively, you can also start again by creating a new collection.
Change a field to be indexed
To change a field to be indexed, you need to update the record pipelines.
Lets assume that you have a schema field name 'topics', and you didn't make it 'indexed' when you initially added it. To make the 'topics' field an indexed field, do the following:
- Go to the Pipelines section
- Choose the latest 'Record' pipeline (e.g. "'RECORD' website").
- Add the field 'topics' in both 'create-indexes' and 'allow-fields' steps in the values. Save the pipeline and make it the default version.
- You will need to do a full re-index of your collection to ensure the data on every record is indexed.
You will also need to update your query pipeline based on your desired relevance needs.