- Python
- JavaScript
- API Reference
- Platform
- Metadata Filtering:
query()now accepts an optionalfilterdict to narrow results by document metadata on locally loaded indexes- Comparison operators:
$eq,$ne,$gt,$gte,$lt,$lte - Set operators:
$in,$nin - Composable with
$and/$orfor complex predicates (supports arbitrary nesting) - Numeric coercion: int and float filter values are automatically converted to strings for consistent matching
- Comparison operators:
- Geo-distance filtering: new
$nearoperator filters documents by haversine distance from a"lat,lng,radiusMeters"value - When
filteris passed toquery()but the index is not loaded locally, a warning is logged and the filter is skipped (cloud query API does not yet support filtering) - Updated
inferedge-moss-coredependency to0.6.0
- Bumped
inferedge-moss-coredependency to0.5.0to support session index telemetry andpush_indeximprovements
- All index mutations and reads now go through the Rust ManageClient, replacing the Python HTTP layer
- Index creation uses an async bulk pipeline: binary upload → server-side build → poll until completion
load_indexsupports both V1 and V2 binary formats, with cloud query fallback when index isn’t loaded locally- New return type
MutationResult(withjob_id,index_name,doc_count) forcreate_index,add_docs,delete_docs get_docstakesdoc_idsdirectly instead of wrapping inGetDocumentsOptions
- Query latency reduced from ~2,300ms to ~10ms for 100K vectors
- Optimized search pipeline reducing memory allocations
- Significantly reduced memory overhead for large indexes (100K+ documents) in the context of hybrid search (keyword + semantic)
- Enhanced performance across all index sizes
- Hot Reload & Auto-Refresh: Indexes can now automatically detect and reload when updated in the cloud.
load_index()now accepts optionalauto_refreshandpolling_interval_in_secondsparameters- When
auto_refreshis enabled, the SDK polls for updates at the configured interval (default: 600 seconds) - To stop auto-refresh, call
load_index()again without theauto_refreshoption
load_index()now allows reloading an already-loaded index (previously threw an error)- Index management now uses Rust core for improved performance and reliability
- Adds partial support for Python 3.14 by disabling local embedding service functionality. Full support coming soon.
- Adds support for user-supplied embeddings.
query()now automatically falls back to the cloud API when the index is not loaded locally, enabling queries without requiringload_index()first.- Adds better scoring evaluation for search results
- Removes the ‘<2’ upper bound on numpy dependency.
- Drops support for Python 3.9 and below.
- Bug fix: Keyword search now functions correctly after
load_index(). - New service endpoint with significant infrastructure upgrades. Management operations are now ~3× faster across most real-world use cases, providing faster index operations while also supporting larger payloads.
- Updates
inferedge-moss-coredependency to version 0.2.3 for new ARM64 wheel support.
Adds IntelliSense support in all the IDEs
Adds support for keyword search and alpha blending between keyword and semantic search.
Removes Pipecat integration and MossContextRetriever from the SDK. Will be offered as a pipecat extension instead soon.
Performance improvements for query() calls.
- MossContextRetriever: Added Pipecat integration for real-time voice AI applications
- Automatically enhances LLM conversations with semantic search results from Moss indexes
- Seamless integration with OpenAI LLM context frames
Initial release of inferedge-moss with core features:
- Semantic search using transformer-based embeddings
- Lightweight embedding models for edge computing; supports proprietary “moss-minilm” model
- API key validation with secure host access
- Cloudflare CDN support for fast model loading
- Multi-index support for isolated search spaces
- Add, update, and remove items across indexes
- Query interface with configurable result count
- Performance metrics tracking