Add domain evaluation results (base vs fine-tuned) and fix model ID
Browse files
README.md
CHANGED
|
@@ -583,7 +583,7 @@ Then you can load this model and run inference.
|
|
| 583 |
from sentence_transformers import SparseEncoder
|
| 584 |
|
| 585 |
# Download from the 🤗 Hub
|
| 586 |
-
model = SparseEncoder("
|
| 587 |
# Run inference
|
| 588 |
queries = [
|
| 589 |
"How much did the company charge for depreciation of tangible assets in 2025?",
|
|
@@ -630,6 +630,22 @@ You can finetune this model on your own dataset.
|
|
| 630 |
|
| 631 |
## Evaluation
|
| 632 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 633 |
### Metrics
|
| 634 |
|
| 635 |
#### Sparse Information Retrieval
|
|
|
|
| 583 |
from sentence_transformers import SparseEncoder
|
| 584 |
|
| 585 |
# Download from the 🤗 Hub
|
| 586 |
+
model = SparseEncoder("oneryalcin/fin-sparse-encoder-doc-v1")
|
| 587 |
# Run inference
|
| 588 |
queries = [
|
| 589 |
"How much did the company charge for depreciation of tangible assets in 2025?",
|
|
|
|
| 630 |
|
| 631 |
## Evaluation
|
| 632 |
|
| 633 |
+
### Domain Evaluation: Financial Document Retrieval
|
| 634 |
+
|
| 635 |
+
Evaluated on 2,028 held-out financial test examples (SEC filings + earnings call transcripts). Each example has 1 query + 1 positive + up to 7 hard negatives. We re-rank candidates per query using sparse dot-product similarity.
|
| 636 |
+
|
| 637 |
+
| Metric | Base Model | Fine-tuned (2 epochs) | Delta |
|
| 638 |
+
|:------------|:-----------|:----------------------|:--------|
|
| 639 |
+
| **acc@1** | 39.9% | **55.2%** | +15.2% |
|
| 640 |
+
| **acc@3** | 69.2% | **84.0%** | +14.8% |
|
| 641 |
+
| **acc@5** | 84.1% | **93.7%** | +9.6% |
|
| 642 |
+
| **mrr@10** | 0.580 | **0.710** | +13.0% |
|
| 643 |
+
| **ndcg@10** | 0.681 | **0.781** | +10.0% |
|
| 644 |
+
| mean_rank | 2.90 | **2.09** | -0.81 |
|
| 645 |
+
| median_rank | 2.0 | **1.0** | -1.0 |
|
| 646 |
+
|
| 647 |
+
The fine-tuned model ranks the correct document first (median_rank=1.0) more often than not, compared to position 2 for the base model. Sparsity also improved: corpus active dims decreased from ~2,500 to ~1,666, meaning faster inverted index lookups at inference time.
|
| 648 |
+
|
| 649 |
### Metrics
|
| 650 |
|
| 651 |
#### Sparse Information Retrieval
|