BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ssaba/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'What was the significant tax benefit recorded in 2023 for federal research and development credits?',
    'The company recorded a tax benefit of approximately $600 million in 2023 related to federal research and development credits, based on updated estimates of qualifying expenditures from the 2022 U.S. federal R&D credit.',
    'The Phase 3 OAKTREE trial of obeldesivir in non-hospitalized participants without risk factors for developing severe COVID-19 did not meet its primary endpoint of improvement in time to symptom alleviation. Obeldesivir was well-tolerated in this large study population.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7157
cosine_accuracy@3 0.8529
cosine_accuracy@5 0.8943
cosine_accuracy@10 0.9271
cosine_precision@1 0.7157
cosine_precision@3 0.2843
cosine_precision@5 0.1789
cosine_precision@10 0.0927
cosine_recall@1 0.7157
cosine_recall@3 0.8529
cosine_recall@5 0.8943
cosine_recall@10 0.9271
cosine_ndcg@10 0.8241
cosine_mrr@10 0.7907
cosine_map@100 0.7935

Information Retrieval

Metric Value
cosine_accuracy@1 0.7143
cosine_accuracy@3 0.8586
cosine_accuracy@5 0.8886
cosine_accuracy@10 0.9257
cosine_precision@1 0.7143
cosine_precision@3 0.2862
cosine_precision@5 0.1777
cosine_precision@10 0.0926
cosine_recall@1 0.7143
cosine_recall@3 0.8586
cosine_recall@5 0.8886
cosine_recall@10 0.9257
cosine_ndcg@10 0.8238
cosine_mrr@10 0.7907
cosine_map@100 0.7935

Information Retrieval

Metric Value
cosine_accuracy@1 0.7071
cosine_accuracy@3 0.8486
cosine_accuracy@5 0.8929
cosine_accuracy@10 0.92
cosine_precision@1 0.7071
cosine_precision@3 0.2829
cosine_precision@5 0.1786
cosine_precision@10 0.092
cosine_recall@1 0.7071
cosine_recall@3 0.8486
cosine_recall@5 0.8929
cosine_recall@10 0.92
cosine_ndcg@10 0.819
cosine_mrr@10 0.786
cosine_map@100 0.7892

Information Retrieval

Metric Value
cosine_accuracy@1 0.6871
cosine_accuracy@3 0.82
cosine_accuracy@5 0.8743
cosine_accuracy@10 0.92
cosine_precision@1 0.6871
cosine_precision@3 0.2733
cosine_precision@5 0.1749
cosine_precision@10 0.092
cosine_recall@1 0.6871
cosine_recall@3 0.82
cosine_recall@5 0.8743
cosine_recall@10 0.92
cosine_ndcg@10 0.8037
cosine_mrr@10 0.7665
cosine_map@100 0.7694

Information Retrieval

Metric Value
cosine_accuracy@1 0.67
cosine_accuracy@3 0.7886
cosine_accuracy@5 0.8386
cosine_accuracy@10 0.89
cosine_precision@1 0.67
cosine_precision@3 0.2629
cosine_precision@5 0.1677
cosine_precision@10 0.089
cosine_recall@1 0.67
cosine_recall@3 0.7886
cosine_recall@5 0.8386
cosine_recall@10 0.89
cosine_ndcg@10 0.7782
cosine_mrr@10 0.7426
cosine_map@100 0.7462

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 20.68 tokens
    • max: 45 tokens
    • min: 8 tokens
    • mean: 46.0 tokens
    • max: 371 tokens
  • Samples:
    anchor positive
    What happens to the guarantee provided by NBCUniversal or Comcast Cable on Comcast’s debt securities upon a disposition of the Guarantor entity? However, a guarantee by NBCUniversal or Comcast Cable of Comcast’s debt securities, or by NBCUniversal of Comcast Cable’s debt securities, will terminate upon a disposition of such Guarantor entity or all or substantially all of its assets.
    What authority can the FDIC invoke during the liquidation of a financial institution under certain determinations by the Secretary of the Treasury? In the event of an appointment of a receiver for a financial institution, the FDIC could invoke the orderly liquidation authority, instead of the U.S. Bankruptcy Code, if the Secretary of the Treasury makes certain financial distress and systemic risk determinations.
    What was the significant tax benefit recorded in 2023 for federal research and development credits? The company recorded a tax benefit of approximately $600 million in 2023 related to federal research and development credits, based on updated estimates of qualifying expenditures from the 2022 U.S. federal R&D credit.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8122 10 1.5919 - - - - -
0.9746 12 - 0.7584 0.7756 0.7811 0.7129 0.7787
1.6244 20 0.6519 - - - - -
1.9492 24 - 0.7635 0.7834 0.7870 0.7356 0.7864
2.4365 30 0.4638 - - - - -
2.9239 36 - 0.7677 0.7872 0.7926 0.7431 0.7927
3.2487 40 0.3569 - - - - -
3.8985 48 - 0.7694 0.7892 0.7935 0.7462 0.7935
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ssaba/bge-base-financial-matryoshka

Finetuned
(449)
this model

Papers for ssaba/bge-base-financial-matryoshka

Evaluation results