KayaTechAI
/

BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline

BGE-M3-Telecom-Retrieval-Embedding

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the telecom-technical-documents-retrieval-embedding-dataset dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-m3
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- telecom-technical-documents-retrieval-embedding-dataset
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline")
# Run inference
sentences = [
    'What is the provisioning scope for the eMLPP service?',
    'eMLPP is provisioned per subscriber.',
    'The main objective is to verify that the User Equipment (UE) tracks channel variations and selects the optimal transport format for frequency non-selective scheduling.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7420, -0.0991],
#         [ 0.7420,  1.0000, -0.1267],
#         [-0.0991, -0.1267,  1.0000]])

Evaluation

Metrics

Information Retrieval

Dataset: dim_1024
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 1024
}
```

Metric	Value
cosine_accuracy@1	0.7852
cosine_accuracy@3	0.9008
cosine_accuracy@5	0.9324
cosine_accuracy@10	0.956
cosine_precision@1	0.7852
cosine_precision@3	0.3003
cosine_precision@5	0.1865
cosine_precision@10	0.0956
cosine_recall@1	0.7852
cosine_recall@3	0.9008
cosine_recall@5	0.9324
cosine_recall@10	0.956
cosine_ndcg@10	0.8744
cosine_mrr@10	0.8478
cosine_map@100	0.8497

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 768
}
```

Metric	Value
cosine_accuracy@1	0.7848
cosine_accuracy@3	0.9008
cosine_accuracy@5	0.9312
cosine_accuracy@10	0.9572
cosine_precision@1	0.7848
cosine_precision@3	0.3003
cosine_precision@5	0.1862
cosine_precision@10	0.0957
cosine_recall@1	0.7848
cosine_recall@3	0.9008
cosine_recall@5	0.9312
cosine_recall@10	0.9572
cosine_ndcg@10	0.8748
cosine_mrr@10	0.848
cosine_map@100	0.8499

Information Retrieval

Dataset: dim_512
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 512
}
```

Metric	Value
cosine_accuracy@1	0.784
cosine_accuracy@3	0.9012
cosine_accuracy@5	0.9308
cosine_accuracy@10	0.9544
cosine_precision@1	0.784
cosine_precision@3	0.3004
cosine_precision@5	0.1862
cosine_precision@10	0.0954
cosine_recall@1	0.784
cosine_recall@3	0.9012
cosine_recall@5	0.9308
cosine_recall@10	0.9544
cosine_ndcg@10	0.8734
cosine_mrr@10	0.8469
cosine_map@100	0.8488

Information Retrieval

Dataset: dim_256
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 256
}
```

Metric	Value
cosine_accuracy@1	0.7768
cosine_accuracy@3	0.8956
cosine_accuracy@5	0.9292
cosine_accuracy@10	0.9524
cosine_precision@1	0.7768
cosine_precision@3	0.2985
cosine_precision@5	0.1858
cosine_precision@10	0.0952
cosine_recall@1	0.7768
cosine_recall@3	0.8956
cosine_recall@5	0.9292
cosine_recall@10	0.9524
cosine_ndcg@10	0.8691
cosine_mrr@10	0.8418
cosine_map@100	0.8438

Information Retrieval

Dataset: dim_128
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 128
}
```

Metric	Value
cosine_accuracy@1	0.7716
cosine_accuracy@3	0.888
cosine_accuracy@5	0.92
cosine_accuracy@10	0.9456
cosine_precision@1	0.7716
cosine_precision@3	0.296
cosine_precision@5	0.184
cosine_precision@10	0.0946
cosine_recall@1	0.7716
cosine_recall@3	0.888
cosine_recall@5	0.92
cosine_recall@10	0.9456
cosine_ndcg@10	0.8624
cosine_mrr@10	0.8353
cosine_map@100	0.8374

Information Retrieval

Dataset: dim_64
Evaluated with InformationRetrievalEvaluator with these parameters:
```
{
    "truncate_dim": 64
}
```

Metric	Value
cosine_accuracy@1	0.7452
cosine_accuracy@3	0.86
cosine_accuracy@5	0.8992
cosine_accuracy@10	0.9332
cosine_precision@1	0.7452
cosine_precision@3	0.2867
cosine_precision@5	0.1798
cosine_precision@10	0.0933
cosine_recall@1	0.7452
cosine_recall@3	0.86
cosine_recall@5	0.8992
cosine_recall@10	0.9332
cosine_ndcg@10	0.8411
cosine_mrr@10	0.8114
cosine_map@100	0.8138

Training Details

Training Dataset

telecom-technical-documents-retrieval-embedding-dataset

Dataset: telecom-technical-documents-retrieval-embedding-dataset at 3ebf34a
Size: 127,731 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 8 tokens
mean: 22.46 tokens
max: 75 tokens

min: 5 tokens
mean: 31.38 tokens
max: 95 tokens

	anchor	positive
type	string	string
details	min: 8 tokens mean: 22.46 tokens max: 75 tokens	min: 5 tokens mean: 31.38 tokens max: 95 tokens

Samples:

anchor	positive
`What is the estimated Transmit power considered sufficient for achieving 95% Downlink coverage with a single Base Station?`	`Approximately 14 dBm Transmit power is considered sufficient.`
`What is the primary goal of the Nominal Accuracy requirement?`	`The primary goal of the Nominal Accuracy requirement is to ensure good accuracy when signal conditions are ideal.`
`What happens on the mobile station side if contention resolution fails because the G-RNTI value in the network's acknowledgement message differs from what the mobile station sent?`	`If the mobile station receives a PACKET UPLINK ACK/NACK message with a G-RNTI value different from the one it included in its first RLC data blocks, it signifies a contention resolution failure, and the mobile station will not transmit a PACKET CONTROL ACKNOWLEDGEMENT.`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        1024,
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
gradient_accumulation_steps: 16
learning_rate: 2e-05
num_train_epochs: 4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
tf32: True
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 16
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: True
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	dim_1024_cosine_ndcg@10	dim_768_cosine_ndcg@10	dim_512_cosine_ndcg@10	dim_256_cosine_ndcg@10	dim_128_cosine_ndcg@10	dim_64_cosine_ndcg@10
0.0401	10	1.8502	-	-	-	-	-	-
0.0802	20	1.54	-	-	-	-	-	-
0.1202	30	1.0721	-	-	-	-	-	-
0.1603	40	0.8011	-	-	-	-	-	-
0.2004	50	0.6764	-	-	-	-	-	-
0.2405	60	0.5557	-	-	-	-	-	-
0.2806	70	0.4677	-	-	-	-	-	-
0.3206	80	0.4346	-	-	-	-	-	-
0.3607	90	0.4035	-	-	-	-	-	-
0.4008	100	0.3237	-	-	-	-	-	-
0.4409	110	0.3119	-	-	-	-	-	-
0.4810	120	0.2999	-	-	-	-	-	-
0.5210	130	0.287	-	-	-	-	-	-
0.5611	140	0.2993	-	-	-	-	-	-
0.6012	150	0.3116	-	-	-	-	-	-
0.6413	160	0.2408	-	-	-	-	-	-
0.6814	170	0.2748	-	-	-	-	-	-
0.7214	180	0.2801	-	-	-	-	-	-
0.7615	190	0.2534	-	-	-	-	-	-
0.8016	200	0.2597	-	-	-	-	-	-
0.8417	210	0.2602	-	-	-	-	-	-
0.8818	220	0.2298	-	-	-	-	-	-
0.9218	230	0.2385	-	-	-	-	-	-
0.9619	240	0.2156	-	-	-	-	-	-
1.0	250	0.2006	0.8670	0.8656	0.8628	0.8567	0.8440	0.8126
1.0401	260	0.1423	-	-	-	-	-	-
1.0802	270	0.161	-	-	-	-	-	-
1.1202	280	0.1638	-	-	-	-	-	-
1.1603	290	0.1358	-	-	-	-	-	-
1.2004	300	0.1609	-	-	-	-	-	-
1.2405	310	0.1643	-	-	-	-	-	-
1.2806	320	0.1413	-	-	-	-	-	-
1.3206	330	0.1411	-	-	-	-	-	-
1.3607	340	0.1416	-	-	-	-	-	-
1.4008	350	0.1315	-	-	-	-	-	-
1.4409	360	0.1377	-	-	-	-	-	-
1.4810	370	0.1174	-	-	-	-	-	-
1.5210	380	0.1319	-	-	-	-	-	-
1.5611	390	0.1169	-	-	-	-	-	-
1.6012	400	0.115	-	-	-	-	-	-
1.6413	410	0.1382	-	-	-	-	-	-
1.6814	420	0.1576	-	-	-	-	-	-
1.7214	430	0.1253	-	-	-	-	-	-
1.7615	440	0.133	-	-	-	-	-	-
1.8016	450	0.1231	-	-	-	-	-	-
1.8417	460	0.1364	-	-	-	-	-	-
1.8818	470	0.1047	-	-	-	-	-	-
1.9218	480	0.1268	-	-	-	-	-	-
1.9619	490	0.1284	-	-	-	-	-	-
2.0	500	0.1134	0.8728	0.8718	0.8706	0.8660	0.8563	0.8323
2.0401	510	0.0787	-	-	-	-	-	-
2.0802	520	0.0765	-	-	-	-	-	-
2.1202	530	0.079	-	-	-	-	-	-
2.1603	540	0.0764	-	-	-	-	-	-
2.2004	550	0.0913	-	-	-	-	-	-
2.2405	560	0.0797	-	-	-	-	-	-
2.2806	570	0.0865	-	-	-	-	-	-
2.3206	580	0.0843	-	-	-	-	-	-
2.3607	590	0.0841	-	-	-	-	-	-
2.4008	600	0.0842	-	-	-	-	-	-
2.4409	610	0.0827	-	-	-	-	-	-
2.4810	620	0.0968	-	-	-	-	-	-
2.5210	630	0.0781	-	-	-	-	-	-
2.5611	640	0.0745	-	-	-	-	-	-
2.6012	650	0.0744	-	-	-	-	-	-
2.6413	660	0.0854	-	-	-	-	-	-
2.6814	670	0.0807	-	-	-	-	-	-
2.7214	680	0.0678	-	-	-	-	-	-
2.7615	690	0.0795	-	-	-	-	-	-
2.8016	700	0.0845	-	-	-	-	-	-
2.8417	710	0.0846	-	-	-	-	-	-
2.8818	720	0.0957	-	-	-	-	-	-
2.9218	730	0.0723	-	-	-	-	-	-
2.9619	740	0.0676	-	-	-	-	-	-
3.0	750	0.0804	0.8748	0.8736	0.8728	0.8703	0.8633	0.8394
3.0401	760	0.053	-	-	-	-	-	-
3.0802	770	0.0622	-	-	-	-	-	-
3.1202	780	0.0777	-	-	-	-	-	-
3.1603	790	0.07	-	-	-	-	-	-
3.2004	800	0.0662	-	-	-	-	-	-
3.2405	810	0.0704	-	-	-	-	-	-
3.2806	820	0.0722	-	-	-	-	-	-
3.3206	830	0.0663	-	-	-	-	-	-
3.3607	840	0.0631	-	-	-	-	-	-
3.4008	850	0.0616	-	-	-	-	-	-
3.4409	860	0.0639	-	-	-	-	-	-
3.4810	870	0.0595	-	-	-	-	-	-
3.5210	880	0.071	-	-	-	-	-	-
3.5611	890	0.0748	-	-	-	-	-	-
3.6012	900	0.0648	-	-	-	-	-	-
3.6413	910	0.067	-	-	-	-	-	-
3.6814	920	0.0625	-	-	-	-	-	-
3.7214	930	0.0619	-	-	-	-	-	-
3.7615	940	0.0612	-	-	-	-	-	-
3.8016	950	0.067	-	-	-	-	-	-
3.8417	960	0.0597	-	-	-	-	-	-
3.8818	970	0.0593	-	-	-	-	-	-
3.9218	980	0.0742	-	-	-	-	-	-
3.9619	990	0.0691	-	-	-	-	-	-
4.0	1000	0.067	0.8744	0.8748	0.8734	0.8691	0.8624	0.8411

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.2.3
Transformers: 4.55.4
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 3.6.0
Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 17

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline

Base model

BAAI/bge-m3

Finetuned

(396)

this model

Papers for KayaTechAI/BGE-M3-0.56B-Fine-Tuned-Telecom-Technical-Documents-Retrieval-Embedding-Generalization-Baseline

Evaluation results

Cosine Accuracy@1 on dim 1024
self-reported

0.785
Cosine Accuracy@3 on dim 1024
self-reported

0.901
Cosine Accuracy@5 on dim 1024
self-reported

0.932
Cosine Accuracy@10 on dim 1024
self-reported

0.956
Cosine Precision@1 on dim 1024
self-reported

0.785
Cosine Precision@3 on dim 1024
self-reported

0.300
Cosine Precision@5 on dim 1024
self-reported

0.186
Cosine Precision@10 on dim 1024
self-reported

0.096
Cosine Recall@1 on dim 1024
self-reported

0.785
Cosine Recall@3 on dim 1024
self-reported

0.901
Cosine Recall@5 on dim 1024
self-reported

0.932
Cosine Recall@10 on dim 1024
self-reported

0.956
Cosine Ndcg@10 on dim 1024
self-reported

0.874
Cosine Mrr@10 on dim 1024
self-reported

0.848
Cosine Map@100 on dim 1024
self-reported

0.850
Cosine Accuracy@1 on dim 768
self-reported

0.785
Cosine Accuracy@3 on dim 768
self-reported

0.901
Cosine Accuracy@5 on dim 768
self-reported

0.931
Cosine Accuracy@10 on dim 768
self-reported

0.957
Cosine Precision@1 on dim 768
self-reported

0.785