Moss - Real-time Semantic Search for Conversational AI

LiveKit provides the real-time audio pipeline (speech-to-text, LLM, text-to-speech); Moss supplies retrieval. The agent loads a Moss index (and optionally opens a session for the live conversation), then queries it in-process to ground its answers - each lookup is a local call, not a network request.

Pipeline

Two ways to build it

LiveKit integration (DIY)

Add Moss to your own LiveKit agent: load a knowledge index, open a per-call session, and expose search as function tools. Full, copy-pasteable recipe.

Managed Voice Agents

Let Moss host the agent: the Agent SDK, backend token minting, and the deploy CLI.

Prefer to start from a working app? Clone the LiveKit voice agent sample and add your own keys.

How retrieval fits

Load a Moss index into memory at start; retrieval then runs in-process (single-digit ms).
For live conversation memory, open a session and index transcript turns as they arrive - see Live-call context.
Expose retrieval to the LLM as a function tool so it searches only when it needs to.

Live-call context

Short-term session context plus long-term knowledge during a call.

Sub-10ms knowledge retrieval

The load-then-query retrieval pipeline.

Sub-10ms Knowledge Retrieval Agent Context

⌘I

​Pipeline

​Two ways to build it

LiveKit integration (DIY)

Managed Voice Agents

​How retrieval fits

​Related

Live-call context

Sub-10ms knowledge retrieval

Pipeline

Two ways to build it

How retrieval fits

Related