all-MiniLM-L6-v6-pair_score

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'basic choker',
    'unisex sweatshirt',
    'unisex sweatshirt',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 15
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 15
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss
0.1721	100	10.8697	-
0.3442	200	9.1125	-
0.5164	300	6.8873	-
0.6885	400	3.1124	-
0.8606	500	1.0882	-
1.0327	600	0.869	-
1.2048	700	0.6952	-
1.3769	800	0.5522	-
1.5491	900	0.5184	-
1.7212	1000	0.3996	-
1.8933	1100	0.6316	-
2.0654	1200	0.5352	-
2.2375	1300	0.3731	-
2.4096	1400	0.3376	-
2.5818	1500	0.597	-
2.7539	1600	0.5737	-
2.9260	1700	0.7107	-
3.0981	1800	0.4356	-
3.2702	1900	0.5581	-
3.4423	2000	0.2012	-
3.6145	2100	0.3906	-
3.7866	2200	0.5386	-
3.9587	2300	0.2624	-
4.1308	2400	0.3573	-
4.3029	2500	0.4798	-
4.4750	2600	0.2465	-
4.6472	2700	0.3482	-
4.8193	2800	0.1915	-
4.9914	2900	0.4617	-
5.1635	3000	0.2874	-
5.3356	3100	0.4636	-
5.5077	3200	0.1344	-
5.6799	3300	0.3615	-
5.8520	3400	0.309	-
6.0241	3500	0.1883	-
6.1962	3600	0.4029	-
6.3683	3700	0.2082	-
6.5404	3800	0.1333	-
6.7126	3900	0.1509	-
6.8847	4000	0.6264	-
7.0568	4100	0.2177	-
7.2289	4200	0.1957	-
7.4010	4300	0.2887	-
7.5731	4400	0.2271	-
7.7453	4500	0.3486	-
7.9174	4600	0.4429	-
8.0895	4700	0.4398	-
8.2616	4800	0.31	-
8.4337	4900	0.2045	-
8.6059	5000	0.2583	0.2371
8.7780	5100	0.2774	-
8.9501	5200	0.1902	-
9.1222	5300	0.3058	-
9.2943	5400	0.3742	-
9.4664	5500	0.2972	-
9.6386	5600	0.3084	-
9.8107	5700	0.1215	-
9.9828	5800	0.1876	-
10.1549	5900	0.1702	-
10.3270	6000	0.2506	-
10.4991	6100	0.2852	-
10.6713	6200	0.2354	-
10.8434	6300	0.214	-
11.0155	6400	0.3815	-
11.1876	6500	0.0803	-
11.3597	6600	0.1941	-
11.5318	6700	0.1576	-
11.7040	6800	0.2911	-
11.8761	6900	0.4913	-
12.0482	7000	0.2759	-
12.2203	7100	0.2928	-
12.3924	7200	0.2181	-
12.5645	7300	0.1286	-
12.7367	7400	0.3342	-
12.9088	7500	0.1577	-
13.0809	7600	0.2578	-
13.2530	7700	0.2844	-
13.4251	7800	0.0917	-
13.5972	7900	0.2617	-
13.7694	8000	0.3021	-
13.9415	8100	0.1036	-
14.1136	8200	0.5471	-
14.2857	8300	0.2395	-
14.4578	8400	0.2664	-
14.6299	8500	0.2697	-
14.8021	8600	0.1569	-
14.9742	8700	0.116	-

Framework Versions

Python: 3.8.10
Sentence Transformers: 3.1.1
Transformers: 4.45.2
PyTorch: 2.4.1+cu118
Accelerate: 1.0.1
Datasets: 3.0.1
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

AnglELoss

@misc{li2023angleoptimized,
    title={AnglE-optimized Text Embeddings},
    author={Xianming Li and Jing Li},
    year={2023},
    eprint={2309.12871},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}