SentenceTransformer based on Alibaba-NLP/gte-modernbert-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-modernbert-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Omar | Kelly | Cunningham, Williams and Williams | omar.kelly@icloud.com | Germany',
    'Omar | Kully |  | omar.kelly@icloud.com | Germany',
    'Karen | Williams | Cunningham Ltd | kwilliams@cunningham-ltd.com | USA',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.8735,  0.2695],
#         [ 0.8735,  1.0000, -0.0470],
#         [ 0.2695, -0.0470,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 256
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • warmup_steps: 704
  • bf16: True
  • dataloader_num_workers: 4
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 704
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0213 50 2.8144
0.0426 100 0.1509
0.0639 150 0.0242
0.0851 200 0.024
0.1064 250 0.019
0.1277 300 0.021
0.1490 350 0.0179
0.1703 400 0.0127
0.1916 450 0.0131
0.2129 500 0.0119
0.2341 550 0.0147
0.2554 600 0.0101
0.2767 650 0.0143
0.2980 700 0.0196
0.3193 750 0.0179
0.3406 800 0.0186
0.3619 850 0.0118
0.3831 900 0.0147
0.4044 950 0.0144
0.4257 1000 0.0149
0.4470 1050 0.0154
0.4683 1100 0.0207
0.4896 1150 0.0103
0.5109 1200 0.0234
0.5321 1250 0.0193
0.5534 1300 0.0113
0.5747 1350 0.0125
0.5960 1400 0.0117
0.6173 1450 0.0178
0.6386 1500 0.0098
0.6599 1550 0.0122
0.6811 1600 0.0155
0.7024 1650 0.0144
0.7237 1700 0.0185
0.7450 1750 0.0135
0.7663 1800 0.012
0.7876 1850 0.0108
0.8089 1900 0.0127
0.8301 1950 0.0089
0.8514 2000 0.0154
0.8727 2050 0.0126
0.8940 2100 0.0123
0.9153 2150 0.014
0.9366 2200 0.0107
0.9579 2250 0.0128
0.9791 2300 0.0145
1.0004 2350 0.0096
1.0217 2400 0.0076
1.0430 2450 0.0092
1.0643 2500 0.0098
1.0856 2550 0.0155
1.1069 2600 0.0126
1.1281 2650 0.0098
1.1494 2700 0.0151
1.1707 2750 0.0144
1.1920 2800 0.011
1.2133 2850 0.0208
1.2346 2900 0.0191
1.2559 2950 0.0153
1.2771 3000 0.012
1.2984 3050 0.0155
1.3197 3100 0.0122
1.3410 3150 0.0143
1.3623 3200 0.0091
1.3836 3250 0.0084
1.4049 3300 0.0112
1.4261 3350 0.008
1.4474 3400 0.0089
1.4687 3450 0.0084
1.4900 3500 0.0121
1.5113 3550 0.0181
1.5326 3600 0.0065
1.5539 3650 0.0094
1.5751 3700 0.0098
1.5964 3750 0.0143
1.6177 3800 0.011
1.6390 3850 0.0152
1.6603 3900 0.0103
1.6816 3950 0.0112
1.7029 4000 0.0108
1.7241 4050 0.0103
1.7454 4100 0.0084
1.7667 4150 0.0127
1.7880 4200 0.0081
1.8093 4250 0.0101
1.8306 4300 0.0132
1.8519 4350 0.0167
1.8731 4400 0.0123
1.8944 4450 0.0124
1.9157 4500 0.0116
1.9370 4550 0.0146
1.9583 4600 0.0088
1.9796 4650 0.0129
2.0009 4700 0.0087
2.0221 4750 0.009
2.0434 4800 0.0116
2.0647 4850 0.0128
2.0860 4900 0.0079
2.1073 4950 0.0093
2.1286 5000 0.0168
2.1499 5050 0.0087
2.1711 5100 0.0154
2.1924 5150 0.0102
2.2137 5200 0.0106
2.2350 5250 0.013
2.2563 5300 0.0107
2.2776 5350 0.0175
2.2989 5400 0.0098
2.3201 5450 0.0127
2.3414 5500 0.0144
2.3627 5550 0.0106
2.3840 5600 0.011
2.4053 5650 0.0147
2.4266 5700 0.0096
2.4479 5750 0.0165
2.4691 5800 0.015
2.4904 5850 0.0068
2.5117 5900 0.0144
2.5330 5950 0.0128
2.5543 6000 0.0102
2.5756 6050 0.0128
2.5968 6100 0.0173
2.6181 6150 0.0156
2.6394 6200 0.0084
2.6607 6250 0.0154
2.6820 6300 0.0086
2.7033 6350 0.011
2.7246 6400 0.0107
2.7458 6450 0.012
2.7671 6500 0.0125
2.7884 6550 0.0107
2.8097 6600 0.009
2.8310 6650 0.0079
2.8523 6700 0.0141
2.8736 6750 0.01
2.8948 6800 0.0065
2.9161 6850 0.0084
2.9374 6900 0.0103
2.9587 6950 0.0107
2.9800 7000 0.0085

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 5.2.3
  • Transformers: 4.57.6
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.13.0
  • Datasets: 4.6.1
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
39
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jayshah5696/er-gte-modernbert-base-pipe-ft

Finetuned
(31)
this model

Papers for jayshah5696/er-gte-modernbert-base-pipe-ft