Why local
Embedding on-device keeps data on the machine and takes the network off the hot path. It is also faster: a local quantized model embeds a short query in around 3 ms, versus roughly 22 ms for a separate network embedding call. For real-time work, that difference - paid on every query - is the gap between a response that feels instant and one that lags.How it works
Setup
Pick the on-device model at index creation:moss-minilm (fast, lightweight) or moss-mediumlm (higher accuracy). Moss embeds your documents on-device with the model you choose. If you’d rather supply precomputed vectors from your own pipeline, see Custom embeddings.
Tips
- Batch inputs for speed
- Cache vectors for unchanged content
- Use hybrid retrieval for best relevance
Related
Custom embeddings
Bring your own precomputed vectors.
Sub-10ms knowledge retrieval
The local retrieval pipeline.