BGE-M3-Telecom-Retrieval-Embedding

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the telecom-technical-documents-retrieval-embedding-dataset dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline")
# Run inference
sentences = [
    'What is the provisioning scope for the eMLPP service?',
    'eMLPP is provisioned per subscriber.',
    'The main objective is to verify that the User Equipment (UE) tracks channel variations and selects the optimal transport format for frequency non-selective scheduling.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7420, -0.0991],
#         [ 0.7420,  1.0000, -0.1267],
#         [-0.0991, -0.1267,  1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7852
cosine_accuracy@3 0.9008
cosine_accuracy@5 0.9324
cosine_accuracy@10 0.956
cosine_precision@1 0.7852
cosine_precision@3 0.3003
cosine_precision@5 0.1865
cosine_precision@10 0.0956
cosine_recall@1 0.7852
cosine_recall@3 0.9008
cosine_recall@5 0.9324
cosine_recall@10 0.956
cosine_ndcg@10 0.8744
cosine_mrr@10 0.8478
cosine_map@100 0.8497

Information Retrieval

Metric Value
cosine_accuracy@1 0.7848
cosine_accuracy@3 0.9008
cosine_accuracy@5 0.9312
cosine_accuracy@10 0.9572
cosine_precision@1 0.7848
cosine_precision@3 0.3003
cosine_precision@5 0.1862
cosine_precision@10 0.0957
cosine_recall@1 0.7848
cosine_recall@3 0.9008
cosine_recall@5 0.9312
cosine_recall@10 0.9572
cosine_ndcg@10 0.8748
cosine_mrr@10 0.848
cosine_map@100 0.8499

Information Retrieval

Metric Value
cosine_accuracy@1 0.784
cosine_accuracy@3 0.9012
cosine_accuracy@5 0.9308
cosine_accuracy@10 0.9544
cosine_precision@1 0.784
cosine_precision@3 0.3004
cosine_precision@5 0.1862
cosine_precision@10 0.0954
cosine_recall@1 0.784
cosine_recall@3 0.9012
cosine_recall@5 0.9308
cosine_recall@10 0.9544
cosine_ndcg@10 0.8734
cosine_mrr@10 0.8469
cosine_map@100 0.8488

Information Retrieval

Metric Value
cosine_accuracy@1 0.7768
cosine_accuracy@3 0.8956
cosine_accuracy@5 0.9292
cosine_accuracy@10 0.9524
cosine_precision@1 0.7768
cosine_precision@3 0.2985
cosine_precision@5 0.1858
cosine_precision@10 0.0952
cosine_recall@1 0.7768
cosine_recall@3 0.8956
cosine_recall@5 0.9292
cosine_recall@10 0.9524
cosine_ndcg@10 0.8691
cosine_mrr@10 0.8418
cosine_map@100 0.8438

Information Retrieval

Metric Value
cosine_accuracy@1 0.7716
cosine_accuracy@3 0.888
cosine_accuracy@5 0.92
cosine_accuracy@10 0.9456
cosine_precision@1 0.7716
cosine_precision@3 0.296
cosine_precision@5 0.184
cosine_precision@10 0.0946
cosine_recall@1 0.7716
cosine_recall@3 0.888
cosine_recall@5 0.92
cosine_recall@10 0.9456
cosine_ndcg@10 0.8624
cosine_mrr@10 0.8353
cosine_map@100 0.8374

Information Retrieval

Metric Value
cosine_accuracy@1 0.7452
cosine_accuracy@3 0.86
cosine_accuracy@5 0.8992
cosine_accuracy@10 0.9332
cosine_precision@1 0.7452
cosine_precision@3 0.2867
cosine_precision@5 0.1798
cosine_precision@10 0.0933
cosine_recall@1 0.7452
cosine_recall@3 0.86
cosine_recall@5 0.8992
cosine_recall@10 0.9332
cosine_ndcg@10 0.8411
cosine_mrr@10 0.8114
cosine_map@100 0.8138

Training Details

Training Dataset

telecom-technical-documents-retrieval-embedding-dataset

  • Dataset: telecom-technical-documents-retrieval-embedding-dataset at 3ebf34a
  • Size: 127,731 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 22.46 tokens
    • max: 75 tokens
    • min: 5 tokens
    • mean: 31.38 tokens
    • max: 95 tokens
  • Samples:
    anchor positive
    What is the estimated Transmit power considered sufficient for achieving 95% Downlink coverage with a single Base Station? Approximately 14 dBm Transmit power is considered sufficient.
    What is the primary goal of the Nominal Accuracy requirement? The primary goal of the Nominal Accuracy requirement is to ensure good accuracy when signal conditions are ideal.
    What happens on the mobile station side if contention resolution fails because the G-RNTI value in the network's acknowledgement message differs from what the mobile station sent? If the mobile station receives a PACKET UPLINK ACK/NACK message with a G-RNTI value different from the one it included in its first RLC data blocks, it signifies a contention resolution failure, and the mobile station will not transmit a PACKET CONTROL ACKNOWLEDGEMENT.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_1024_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0401 10 1.8502 - - - - - -
0.0802 20 1.54 - - - - - -
0.1202 30 1.0721 - - - - - -
0.1603 40 0.8011 - - - - - -
0.2004 50 0.6764 - - - - - -
0.2405 60 0.5557 - - - - - -
0.2806 70 0.4677 - - - - - -
0.3206 80 0.4346 - - - - - -
0.3607 90 0.4035 - - - - - -
0.4008 100 0.3237 - - - - - -
0.4409 110 0.3119 - - - - - -
0.4810 120 0.2999 - - - - - -
0.5210 130 0.287 - - - - - -
0.5611 140 0.2993 - - - - - -
0.6012 150 0.3116 - - - - - -
0.6413 160 0.2408 - - - - - -
0.6814 170 0.2748 - - - - - -
0.7214 180 0.2801 - - - - - -
0.7615 190 0.2534 - - - - - -
0.8016 200 0.2597 - - - - - -
0.8417 210 0.2602 - - - - - -
0.8818 220 0.2298 - - - - - -
0.9218 230 0.2385 - - - - - -
0.9619 240 0.2156 - - - - - -
1.0 250 0.2006 0.8670 0.8656 0.8628 0.8567 0.8440 0.8126
1.0401 260 0.1423 - - - - - -
1.0802 270 0.161 - - - - - -
1.1202 280 0.1638 - - - - - -
1.1603 290 0.1358 - - - - - -
1.2004 300 0.1609 - - - - - -
1.2405 310 0.1643 - - - - - -
1.2806 320 0.1413 - - - - - -
1.3206 330 0.1411 - - - - - -
1.3607 340 0.1416 - - - - - -
1.4008 350 0.1315 - - - - - -
1.4409 360 0.1377 - - - - - -
1.4810 370 0.1174 - - - - - -
1.5210 380 0.1319 - - - - - -
1.5611 390 0.1169 - - - - - -
1.6012 400 0.115 - - - - - -
1.6413 410 0.1382 - - - - - -
1.6814 420 0.1576 - - - - - -
1.7214 430 0.1253 - - - - - -
1.7615 440 0.133 - - - - - -
1.8016 450 0.1231 - - - - - -
1.8417 460 0.1364 - - - - - -
1.8818 470 0.1047 - - - - - -
1.9218 480 0.1268 - - - - - -
1.9619 490 0.1284 - - - - - -
2.0 500 0.1134 0.8728 0.8718 0.8706 0.8660 0.8563 0.8323
2.0401 510 0.0787 - - - - - -
2.0802 520 0.0765 - - - - - -
2.1202 530 0.079 - - - - - -
2.1603 540 0.0764 - - - - - -
2.2004 550 0.0913 - - - - - -
2.2405 560 0.0797 - - - - - -
2.2806 570 0.0865 - - - - - -
2.3206 580 0.0843 - - - - - -
2.3607 590 0.0841 - - - - - -
2.4008 600 0.0842 - - - - - -
2.4409 610 0.0827 - - - - - -
2.4810 620 0.0968 - - - - - -
2.5210 630 0.0781 - - - - - -
2.5611 640 0.0745 - - - - - -
2.6012 650 0.0744 - - - - - -
2.6413 660 0.0854 - - - - - -
2.6814 670 0.0807 - - - - - -
2.7214 680 0.0678 - - - - - -
2.7615 690 0.0795 - - - - - -
2.8016 700 0.0845 - - - - - -
2.8417 710 0.0846 - - - - - -
2.8818 720 0.0957 - - - - - -
2.9218 730 0.0723 - - - - - -
2.9619 740 0.0676 - - - - - -
3.0 750 0.0804 0.8748 0.8736 0.8728 0.8703 0.8633 0.8394
3.0401 760 0.053 - - - - - -
3.0802 770 0.0622 - - - - - -
3.1202 780 0.0777 - - - - - -
3.1603 790 0.07 - - - - - -
3.2004 800 0.0662 - - - - - -
3.2405 810 0.0704 - - - - - -
3.2806 820 0.0722 - - - - - -
3.3206 830 0.0663 - - - - - -
3.3607 840 0.0631 - - - - - -
3.4008 850 0.0616 - - - - - -
3.4409 860 0.0639 - - - - - -
3.4810 870 0.0595 - - - - - -
3.5210 880 0.071 - - - - - -
3.5611 890 0.0748 - - - - - -
3.6012 900 0.0648 - - - - - -
3.6413 910 0.067 - - - - - -
3.6814 920 0.0625 - - - - - -
3.7214 930 0.0619 - - - - - -
3.7615 940 0.0612 - - - - - -
3.8016 950 0.067 - - - - - -
3.8417 960 0.0597 - - - - - -
3.8818 970 0.0593 - - - - - -
3.9218 980 0.0742 - - - - - -
3.9619 990 0.0691 - - - - - -
4.0 1000 0.067 0.8744 0.8748 0.8734 0.8691 0.8624 0.8411
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 4.55.4
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
17
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline

Base model

BAAI/bge-m3
Finetuned
(396)
this model

Papers for KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline

Evaluation results