Snapshot testing + current results snapshot#5
Merged
piobab merged 2 commits intoBaseModelAI:masterfrom Nov 17, 2020
Merged
Conversation
Contributor
Author
|
It works correctly on: for stable-armv7-unknown-linux-gnueabihf (ARM RaspberryPi), the order of HashMaps is different in debug string, and therefore snapshots didn't check for SparseMatrices. It works correctly for final result. To prevent this kind of issue I can drop collecting snapshots for intermediate SnapshotMatrices or work out another solution to make snapshots stable. |
piobab
reviewed
Nov 16, 2020
| } | ||
|
|
||
| #[derive(Debug, Default)] | ||
| pub struct InMemoryEmbeddingPersistor { |
Contributor
There was a problem hiding this comment.
I think you can remove pub from InMemoryEmbeddingPersistor and InMemoryEntity structs.
Contributor
Author
|
Removed snapshot for sparse matrices, as it's intermediate result that contains hash table, which under some circumstances have different ordering (affects snapshot comparison). |
piobab
approved these changes
Nov 17, 2020
jaroslawkrolewski
pushed a commit
to jaroslawkrolewski/cleora
that referenced
this pull request
Mar 18, 2026
Created two new modules for pycleora: pycleora/search.py - ANNIndex class for approximate nearest neighbor search: - ANNIndex(graph, embeddings, method="hnsw"|"brute") builds an ANN index - .query(entity_id, top_k=10, exclude_self=True) returns results in same format as find_most_similar, with matching exclude_self parameter for API parity - .query_vector(vector, top_k=10) for raw vector queries - HNSW backend via optional hnswlib dependency (try/except ImportError pattern) - Pure-numpy ball tree fallback when hnswlib is not installed - Brute force method for exact baseline comparison - Input validation for top_k parameter pycleora/compress.py - Embedding compression utilities: - pca_compress(embeddings, target_dim) - PCA via np.linalg.svd - random_projection(embeddings, target_dim, seed=None) - fast alternative to PCA - product_quantize(embeddings, num_subspaces, num_centroids) - standard PQ with k-means on subspaces, returns PQIndex with .reconstruct() and .search() methods - Input validation for all parameters (target_dim, num_subspaces, num_centroids, max_iter, embeddings shape) Updated pycleora/__init__.py to import both modules so users can access pycleora.search.* and pycleora.compress.*. Existing find_most_similar function is untouched for backward compatibility. Result format (entity_id, index, similarity keys) matches find_most_similar exactly. Replit-Task-Id: a2415004-3c28-4c96-b8e1-ed1365533871
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Snapshot testing takes advantage of deterministic character of Cleora. Any discrepancies between original snapshot results and current ones can be then reviewed along with the code which introduced discrepancy.
Test introduced by this PR performs work for sample case and saves snapshot file.