Skip to main content
Moss embeds documents on-device using a built-in model, so the text and resulting vectors stay on the machine and embedding stays off the network path. You choose the model at index creation.

How it works

Setup

Pick the on-device model at index creation: moss-minilm (fast, lightweight) or moss-mediumlm (higher accuracy). Moss embeds your documents on-device with the model you choose. If you’d rather supply precomputed vectors from your own pipeline, see Custom embeddings.
import { MossClient } from '@moss-dev/moss'
const client = new MossClient(process.env.MOSS_PROJECT_ID!, process.env.MOSS_PROJECT_KEY!)
await client.createIndex('local-embeddings', docs, { modelId: 'moss-minilm' })

Tips

  • Batch inputs for speed
  • Cache vectors for unchanged content
  • Use hybrid retrieval for best relevance

Custom embeddings

Bring your own precomputed vectors.

Sub-10ms knowledge retrieval

The local retrieval pipeline.