Add domain evaluation results (base vs fine-tuned) and fix model ID

Files changed (1) hide show

README.md CHANGED Viewed

@@ -583,7 +583,7 @@ Then you can load this model and run inference.
 from sentence_transformers import SparseEncoder
 # Download from the 🤗 Hub
-model = SparseEncoder("sparse_encoder_model_id")
 # Run inference
 queries = [
     "How much did the company charge for depreciation of tangible assets in 2025?",
@@ -630,6 +630,22 @@ You can finetune this model on your own dataset.
 ## Evaluation
 ### Metrics
 #### Sparse Information Retrieval

 from sentence_transformers import SparseEncoder
 # Download from the 🤗 Hub
+model = SparseEncoder("oneryalcin/fin-sparse-encoder-doc-v1")
 # Run inference
 queries = [
     "How much did the company charge for depreciation of tangible assets in 2025?",
 ## Evaluation
+### Domain Evaluation: Financial Document Retrieval
+Evaluated on 2,028 held-out financial test examples (SEC filings + earnings call transcripts). Each example has 1 query + 1 positive + up to 7 hard negatives. We re-rank candidates per query using sparse dot-product similarity.
+| Metric | Base Model | Fine-tuned (2 epochs) | Delta |
+|:------------|:-----------|:----------------------|:--------|
+| **acc@1** | 39.9% | **55.2%** | +15.2% |
+| **acc@3** | 69.2% | **84.0%** | +14.8% |
+| **acc@5** | 84.1% | **93.7%** | +9.6% |
+| **mrr@10** | 0.580 | **0.710** | +13.0% |
+| **ndcg@10** | 0.681 | **0.781** | +10.0% |
+| mean_rank | 2.90 | **2.09** | -0.81 |
+| median_rank | 2.0 | **1.0** | -1.0 |
+The fine-tuned model ranks the correct document first (median_rank=1.0) more often than not, compared to position 2 for the base model. Sparsity also improved: corpus active dims decreased from ~2,500 to ~1,666, meaning faster inverted index lookups at inference time.
 ### Metrics
 #### Sparse Information Retrieval