Moss - Real-time Semantic Search for Conversational AI

A session is an in-process index. You open it with client.session(name), then add, query, and delete documents locally; embedding and search run on-device, with no network call per operation. When you are done, push_index() persists the index to the cloud. A session is represented by a SessionIndex.

Operations

add_docs(docs, options?) - embeds and indexes documents locally; returns (added, updated).
query(text, options?) - semantic or hybrid search over the in-memory index; returns a SearchResult.
get_docs(options?) / delete_docs(ids) - read and remove documents locally.
push_index() - uploads the session to the cloud, creating or replacing the cloud index of the same name. No server-side re-embedding.

session(name) is create-or-resume: if a cloud index with that name already exists it is loaded into the session (no re-embedding); otherwise the session starts empty. The API is the same in both cases.

Example

import asyncio
from moss import DocumentInfo, MossClient, QueryOptions

async def main():
    client = MossClient(MOSS_PROJECT_ID, MOSS_PROJECT_KEY)

    # Open a session: new local index, or load an existing cloud index by name.
    session = await client.session(index_name="notes")

    # Add documents (embedded and indexed locally).
    added, updated = await session.add_docs([
        DocumentInfo(id="1", text="Ship the on-device SDK by Friday."),
        DocumentInfo(id="2", text="Follow up with the LiveKit team about latency."),
    ])
    print(f"{added} added, {updated} updated, {session.doc_count} total")

    # Query the in-memory index.
    results = await session.query("what's due this week", QueryOptions(top_k=3))
    for doc in results.docs:
        print(f"{doc.id} score={doc.score:.3f} {doc.text}")

asyncio.run(main())

Performance characteristics

Operations run in-process: no network round trip, TLS, or serialization on the query path.
Query text is embedded by a local model; with model_id="custom" you supply the query vector via QueryOptions.embedding instead.
Local queries typically complete in single-digit milliseconds, which suits short, frequent queries against a per-session or per-user working set.

Session vs. loaded cloud index

Use a session when	Use a loaded cloud index when
You are indexing data created at runtime (a live call, a chat, a working set)	You are querying a large, stable corpus built ahead of time
You need writes and reads against the same in-memory index	Reads dominate and the data changes infrequently
The data is per-conversation or per-user and short-lived	The index is shared across many sessions

The two compose: load a persistent index and open a session in the same client. See Live-call context.

Sessions guide

The full session lifecycle and API.

SessionIndex reference

Methods, parameters, and return types.

Getting Started

Capabilities

Use Cases

How it works

Pricing

Real-Time Local Indexing

Operations

Example

Performance characteristics

Session vs. loaded cloud index

Sessions guide

SessionIndex reference

​Operations

​Example

​Performance characteristics

​Session vs. loaded cloud index

​Related

Sessions guide

SessionIndex reference

Operations

Example

Performance characteristics

Session vs. loaded cloud index

Related