retrieve node. The graph passes the user query through retrieval, writes the Moss results into shared state, then feeds that context to a generate node — keeping all LLM responses grounded in your knowledge base.
Full example — see the LangGraph cookbook for the complete runnable demo with interactive mode, metadata filter support, and tests.
Why use Moss with LangGraph?
LangGraph’s state-machine model is a natural fit for retrieval-augmented workflows: each node reads from and writes to a shared typed state, so retrieval latency and results are transparent at every step. Moss plugs in as a single async node and keeps query latency in the 1–10ms range when the index is loaded locally — fast enough that retrieval never becomes the bottleneck in a multi-node graph.Required tools
- Moss account with project credentials
- Groq API key (or swap in any LangChain-compatible LLM)
- Python 3.11+
Integration guide
Define the graph state
LangGraph nodes communicate through a shared
TypedDict. Moss results slot in naturally alongside the query and answer fields.Build the retrieve → generate graph
The
retrieve node queries Moss and writes results to state. The generate node reads that context and produces the final answer.Load the index and run
Call You can pass an optional
load_index() before the graph runs. This pulls the index into local memory and keeps retrieval on the ~1–10ms in-memory path instead of the cloud fallback (~100–500ms). Metadata filters also require a locally loaded index to work correctly.metadata_filter through graph state to scope retrieval to a specific category:How it works
query() call inside the graph hits the in-memory path, so even graphs with many retrieval steps stay fast.