Sessions are available in the Python, Swift, Elixir, and C SDKs today. JavaScript (Node)
session support is coming.
How a session works
ASessionIndex is a local index you open
with client.session(name). Every operation runs without a cloud round trip:
add_docs- embeds and indexes locally (~1-5 ms per batch).query- semantic search entirely in-memory (~1-10 ms).push_index- optionally persists the index to the cloud when you’re done.
session() auto-loads it (no
re-embedding); otherwise the session starts empty. The workflow is identical either way.
Example
Why local is faster
Moving retrieval in-process removes the parts of a query that never show up in a vector database’s own benchmark but dominate real-world latency:- Network round trips - even same-region calls cost ~10-50 ms; cross-region or edge deployments cost far more. In-process, that drops to microseconds.
- Serialization and connection overhead - no request encoding, TLS, or connection-pool management on the hot path.
- Embedding - the query is embedded by a local quantized model in ~3 ms instead of a separate ~22 ms network embedding step.
When to use a session vs. a loaded cloud index
| Use a session when… | Use a loaded cloud index when… |
|---|---|
| You’re indexing data that’s being created right now (a live call, a chat, a working set) | You’re querying a large, stable corpus built ahead of time |
| You need writes and reads in the same millisecond budget | Reads dominate and the data changes infrequently |
| Context is per-conversation or per-user and short-lived | Knowledge is shared across many sessions |
Further reading
- Remove the network hop - the latency case for in-process retrieval, with full benchmarks.
Related
Sessions guide
The full session lifecycle.
Live-call context
Short-term + long-term context together.