Moss - Real-time Semantic Search for Conversational AI

Sometimes the answer is spread across separate corpora - a product catalog, its reviews, and an FAQ - that you keep as distinct indexes. Multi-index search queries several loaded indexes in a single call and returns the global top-K, with each result tagged by its source index.

How it works

Load the indexes (in bulk with load_indexes), then query them together with query_multi_index. Every result document carries an index_name so you know where it came from.

Behavior

All indexes must be loaded locally (via load_index / load_indexes) and share the same embedding model.
top_k is global, not per-index - it caps the merged result set.
Embedding-only: scoring uses vectors, so alpha is ignored (BM25 is unsound across separate corpora, where term statistics differ). filter and embedding work the same as in a single-index query.
Bulk lifecycle: load_indexes(names) returns a LoadIndexesResult with loaded and failed (best-effort; a typo in one name doesn’t roll back the others), and unload_indexes(names) releases them and is idempotent.

Implementation

Multi-index search is a Python SDK capability. See the Python guide for a runnable example.

Retrieval

Single-index querying, filters, and hybrid search.

Python reference

query_multi_index, load_indexes, unload_indexes.

Custom Embeddings Storage & Persistence

⌘I

Getting Started

Capabilities

Use Cases

How it works

Pricing

Multi-Index Search

How it works

Behavior

Implementation

Retrieval

Python reference

​How it works

​Behavior

​Implementation

​Related

Retrieval

Python reference

How it works

Behavior

Implementation

Related