- Long-term context - durable knowledge and account facts the agent was built with (FAQs, policies, profile). You load this once at the start of the call.
- Short-term context - the working set of what the customer just said, this call, this minute. You accumulate this in a session as the call unfolds.
Sessions are available in the Python, Swift, Elixir, and C SDKs today. JavaScript (Node)
session support is coming - use Python or Swift for live-call context for now.
A single agent turn
How it works
Load long-term context from the cloud
Load your persistent knowledge index into memory so it’s ready for instant queries.
Open a session for the call
client.session(call_id) returns a local SessionIndex.
If an index with that name already exists in the cloud, it auto-loads; otherwise it starts empty.Index transcript turns as they arrive
Each
add_docs call embeds and indexes locally in ~1-5 ms - fast enough to run inline
during the conversation.Query both indexes for the next agent turn
Pull relevant long-term knowledge and recent short-term context together to ground the response.
Example
Two kinds of context
| Short-term context | Long-term context | |
|---|---|---|
| What | What was just said this call - working insights, the live transcript | Durable knowledge and account facts: FAQs, policies, profile, history |
| Where | A local session, built turn by turn | A persistent cloud index, loaded once |
| Lifetime | The current interaction (optionally persisted at the end) | Across every interaction |
Result
Every agent turn is grounded in durable knowledge and the live conversation, with no network latency on the hot path. At call end, the session is persisted to the cloud - the next call can resume it (see Cross-agent context & omni-channel handoff).Further reading
- The retrieval latency tax - why retrieval, not the model, sets the pace of a voice agent.
Related
Sessions
The session lifecycle in depth.
Real-time local indexing
Why local sessions are sub-10 ms.