Skip to main content
A session is a local index you read and write in real time, with no cloud round trip on any operation. Sessions are how Moss does indexing during a live interaction - indexing transcript turns mid-call, building a per-user working set, or accumulating context that’s handed off between agents. A session is represented by a SessionIndex, created from MossClient.session().
Sessions are available in the Python, Swift, Elixir, and C SDKs today. JavaScript (Node) session support is coming.

Lifecycle

1

Create or resume

client.session(name) returns a SessionIndex. If a cloud index with that name already exists, it auto-loads into the session (no re-embedding); otherwise the session starts empty. The workflow is identical in both cases.
2

Mutate locally

add_docs, delete_docs, and get_docs run in-memory. add_docs embeds locally via the Rust core - no network. With model_id="custom", each document must carry its own .embedding.
3

Query locally

query runs entirely in-memory (~1-10 ms) and supports the same metadata filter syntax as MossClient.query().
4

Persist (optional)

push_index() uploads the session - documents and their locally-computed embeddings - to the cloud under the session’s name, creating or replacing that index. No server-side re-embedding occurs.

Example

import asyncio
from moss import DocumentInfo, MossClient, QueryOptions

async def main():
    client = MossClient(MOSS_PROJECT_ID, MOSS_PROJECT_KEY)

    # Create or resume by name.
    session = await client.session(index_name="my-session-index")
    print(f"{session.doc_count} existing docs loaded")

    # Mutate locally - appended to whatever was loaded.
    added, updated = await session.add_docs([
        DocumentInfo(id="follow-up-1", text="Customer confirmed the refund was received."),
    ])

    # Query locally.
    results = await session.query("refund outcome", QueryOptions(top_k=3))
    for doc in results.docs:
        print(f"{doc.id} score={doc.score:.3f} {doc.text}")

    # Persist to the cloud.
    result = await session.push_index()
    print(f"Pushed {result.doc_count} docs (job {result.job_id})")

asyncio.run(main())

Short-term vs. long-term context

A session is short-term context - the working set for the current interaction. A persistent cloud index loaded with load_index() is long-term context - durable knowledge shared across interactions. Most real-time apps query both; see Live-call context.

Models

The session’s embedding model is set by the model_id argument to session() (default "moss-minilm"; also "moss-mediumlm" or "custom"). When resuming an existing cloud index, omit model_id to adopt the stored model - passing a different one raises a ValueError. All participants resuming the same index must use the same model.

Authentication

Project credentials are validated when the session is opened; session() raises if they’re invalid. See Authentication.

Real-time local indexing

Why sessions are sub-10 ms.

Cross-agent handoff

Resume a session on another agent or channel.