oneryalcin commited on
Commit
d37ba28
·
verified ·
1 Parent(s): a107131

Add domain evaluation results (base vs fine-tuned) and fix model ID

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -583,7 +583,7 @@ Then you can load this model and run inference.
583
  from sentence_transformers import SparseEncoder
584
 
585
  # Download from the 🤗 Hub
586
- model = SparseEncoder("sparse_encoder_model_id")
587
  # Run inference
588
  queries = [
589
  "How much did the company charge for depreciation of tangible assets in 2025?",
@@ -630,6 +630,22 @@ You can finetune this model on your own dataset.
630
 
631
  ## Evaluation
632
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
633
  ### Metrics
634
 
635
  #### Sparse Information Retrieval
 
583
  from sentence_transformers import SparseEncoder
584
 
585
  # Download from the 🤗 Hub
586
+ model = SparseEncoder("oneryalcin/fin-sparse-encoder-doc-v1")
587
  # Run inference
588
  queries = [
589
  "How much did the company charge for depreciation of tangible assets in 2025?",
 
630
 
631
  ## Evaluation
632
 
633
+ ### Domain Evaluation: Financial Document Retrieval
634
+
635
+ Evaluated on 2,028 held-out financial test examples (SEC filings + earnings call transcripts). Each example has 1 query + 1 positive + up to 7 hard negatives. We re-rank candidates per query using sparse dot-product similarity.
636
+
637
+ | Metric | Base Model | Fine-tuned (2 epochs) | Delta |
638
+ |:------------|:-----------|:----------------------|:--------|
639
+ | **acc@1** | 39.9% | **55.2%** | +15.2% |
640
+ | **acc@3** | 69.2% | **84.0%** | +14.8% |
641
+ | **acc@5** | 84.1% | **93.7%** | +9.6% |
642
+ | **mrr@10** | 0.580 | **0.710** | +13.0% |
643
+ | **ndcg@10** | 0.681 | **0.781** | +10.0% |
644
+ | mean_rank | 2.90 | **2.09** | -0.81 |
645
+ | median_rank | 2.0 | **1.0** | -1.0 |
646
+
647
+ The fine-tuned model ranks the correct document first (median_rank=1.0) more often than not, compared to position 2 for the base model. Sparsity also improved: corpus active dims decreased from ~2,500 to ~1,666, meaning faster inverted index lookups at inference time.
648
+
649
  ### Metrics
650
 
651
  #### Sparse Information Retrieval