Moss - Real-time Semantic Search for Conversational AI

Moss is the runtime for real-time semantic search in conversational apps. It delivers sub-10 ms lookups and instant index updates without extra infrastructure. It runs in the browser, on-device, or in the cloud - wherever your agent lives - so search feels native. Connect your data once; Moss packages, distributes, and keeps indexes fresh. Because the index lives next to your agent, retrieval is a function you call, not a service you query.

Sub-10 ms lookups with instant updates
Real-time sessions (Python, JavaScript, Swift, Elixir, C): index and query locally during a conversation, then sync to the cloud
No infra to run; local-first with optional sync
Browser, device, or cloud - same API

Use cases

Sub-10 ms answers for docs, FAQ, and in-app search
Grounding agents with your data without centralizing user info
Local or hybrid embeddings with minimal infrastructure

Capabilities

Real-Time Local Indexing

Index and query on-device in milliseconds, with no network round trip.

Live-Call Context

Short-term session context plus long-term knowledge, during a call.

Data Hydration & Sync

Hydrate from the cloud and stay fresh with zero-downtime hot-swaps.

Cross-Agent Handoff

Carry full context across agents, channels, and devices.

SDKs

One model across every surface: JavaScript (Node), Python, Swift (iOS), Elixir, C, and an in-browser/WASM build.

Using Moss Portal

Sign up at Moss, confirm email, and sign in
From the portal, click Create Index and copy your Project ID and Project Key for your SDK
Join our Discord to get onboarded: Moss Discord

Samples

View samples repo: moss on GitHub
JavaScript: javascript/comprehensive_sample.ts, javascript/load_and_query_sample.ts
Python: python/comprehensive_sample.py, python/load_and_query_sample.py
Adapt by swapping the FAQ data with your own, or plug Moss calls into your app

How it works (at a glance)

Index: Convert your data into an efficient local index
Embeddings: Generate vectors on-device (moss-minilm, moss-mediumlm) or supply your own
Sessions: Index and query locally in real time during a live interaction, then sync to the cloud
Retrieval: Load the index, then query in-memory with semantic or hybrid search
Storage: Persist indexes locally and optionally sync to cloud

Next steps

Quickstart

⌘I

​Use cases

​Capabilities

Real-Time Local Indexing

Live-Call Context

Data Hydration & Sync

Cross-Agent Handoff

​SDKs

​Using Moss Portal

​Samples

​How it works (at a glance)

​Next steps