search_booking tool to call, a Moss query fires automatically on every user turn via on_user_turn_completed, injecting the results as a system message before the LLM is ever invoked. One LLM round-trip per turn instead of two.
Full example — see the Airline PNR cookbook for the complete agent, three sample PNR fixtures, index builder, and eval suite.
Tool-driven vs ambient retrieval
Privacy gate
Ambient retrieval is gated on identity verification. Untilverify_caller succeeds, on_user_turn_completed passes through without querying Moss — no booking details reach the LLM before the caller’s identity is confirmed.
What this demonstrates
| Pattern | Where to look |
|---|---|
| Ambient retrieval | on_user_turn_completed hook |
| Privacy-gated retrieval | data.caller_verified check |
| Per-user indexes (one per PNR) | load_booking, _index_name_for |
| Prompt injection defence | Untrusted-data wrapper in turn_ctx.add_message |
| Structured call summary | submit_call_summary, _build_summary |
Required tools
- Moss account with project credentials
- OpenAI API key (LLM)
- Deepgram API key (STT)
- Cartesia API key (TTS)
- Python 3.10+
Integration guide
Implement ambient retrieval
Override
on_user_turn_completed to run a Moss query before the LLM is invoked. The retrieved context is injected as a system message in the chat context. The LLM sees it as part of the conversation — no tool call, no extra round-trip.Add lifecycle and write tools
The split is clean: ambient = reads, tools = writes.
load_booking and verify_caller are the only tools that affect retrieval behaviour.Per-user indexes
Each booking gets its own Moss index (booking-xkq4p2, booking-wj7bnh, etc.). load_booking switches the active index mid-call, which means one agent can handle a caller asking about multiple bookings in the same session — just call load_booking again with the new PNR and re-verify.
The BOOKING_PNR env var lets an IVR system preload the index before the agent’s first turn, so the caller’s very first question is already grounded.