Hydration: cloud to device
When you open a session or load an index, Moss downloads the index binary and deserializes it into memory, so the agent starts warm instead of empty. No re-embedding happens; the prebuilt vectors are loaded directly.Two layers of freshness
Once an index is live, freshness has two independent handles - one for getting changes into the index, and one for propagating those changes to running agents.Layer 1: mutations
add_docs and delete_docs are the write API. They are asynchronous: you submit the
change, and Moss rebuilds the index server-side without touching your live service. A
mutation moves through statuses (pending_upload, uploading, building, completed);
while building, it reports finer-grained phases such as generating_embeddings and
building_index. Once completed, the new version is available in the cloud.
Layer 2: runtime hot-swap
Load an index withauto_refresh and Moss keeps the in-memory copy current automatically:
Tuning the refresh interval
polling_interval_in_seconds controls how often Moss checks for a new version - not how
fast a build completes. Match it to how quickly your data changes:
| Data change frequency | Suggested interval | Notes |
|---|---|---|
| Near-real-time (live inventory, pricing) | 30-60 s | Frequent polls; size your build time accordingly |
| Regular updates (daily policy changes) | 300-600 s | The default of 600 s fits here |
| Infrequent (quarterly docs, stable FAQs) | 1800 s+ | Low overhead; practically always fresh |
Independent per-index refresh
Each index manages its own refresh lifecycle on its own timer, so you can tune staleness tolerance per knowledge domain. A slow rebuild on one index never blocks queries or refreshes on another.Persist: device to cloud
A session accumulates context locally during an interaction.push_index() syncs it back to the cloud - at the end, or at checkpoints - so it survives
the session and any agent can resume it (see Cross-agent context & omni-channel handoff).
The operational guarantee
Withauto_refresh enabled, the index converges to the latest version within at most one
polling interval after the build completes, with zero service interruption. You update your
documents; the service updates itself; users get current answers and never know a refresh
happened.
Related
Live-call context
Short-term + long-term context during a call.
Sessions
The session lifecycle in depth.