ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/modernbert-embed-base-legal-matryoshka-corrected_train_set_double_training_set_2026_03_09")
# Run inference
sentences = [
    'What authority did the Court believe the Board exercised?',
    'court opined that the Board exercised “substantial independent authority” and thus was also a \nFOIA “agency” under Soucie’s functional test.  Id. at 584–85. \nThis Court’s previous opinion followed Energy Research’s analytical steps.  As with the \nBoard, Congress made the Commission an “establishment in the executive branch,” one of the',
    'On what date were the corporate filings with the Delaware Department of State mentioned?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5720, 0.0843],
#         [0.5720, 1.0000, 0.0574],
#         [0.0843, 0.0574, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6213
cosine_accuracy@3 0.6723
cosine_accuracy@5 0.7372
cosine_accuracy@10 0.7929
cosine_precision@1 0.6213
cosine_precision@3 0.5914
cosine_precision@5 0.4451
cosine_precision@10 0.2516
cosine_recall@1 0.2179
cosine_recall@3 0.5733
cosine_recall@5 0.6967
cosine_recall@10 0.7844
cosine_ndcg@10 0.7106
cosine_mrr@10 0.6624
cosine_map@100 0.6971

Information Retrieval

Metric Value
cosine_accuracy@1 0.6043
cosine_accuracy@3 0.66
cosine_accuracy@5 0.7264
cosine_accuracy@10 0.7821
cosine_precision@1 0.6043
cosine_precision@3 0.5781
cosine_precision@5 0.4362
cosine_precision@10 0.2467
cosine_recall@1 0.2116
cosine_recall@3 0.5623
cosine_recall@5 0.6852
cosine_recall@10 0.7701
cosine_ndcg@10 0.6959
cosine_mrr@10 0.6475
cosine_map@100 0.6831

Information Retrieval

Metric Value
cosine_accuracy@1 0.592
cosine_accuracy@3 0.6476
cosine_accuracy@5 0.7002
cosine_accuracy@10 0.762
cosine_precision@1 0.592
cosine_precision@3 0.5667
cosine_precision@5 0.4235
cosine_precision@10 0.2416
cosine_recall@1 0.2078
cosine_recall@3 0.5522
cosine_recall@5 0.6649
cosine_recall@10 0.7508
cosine_ndcg@10 0.6804
cosine_mrr@10 0.6336
cosine_map@100 0.669

Information Retrieval

Metric Value
cosine_accuracy@1 0.5209
cosine_accuracy@3 0.5688
cosine_accuracy@5 0.6383
cosine_accuracy@10 0.6909
cosine_precision@1 0.5209
cosine_precision@3 0.5013
cosine_precision@5 0.3821
cosine_precision@10 0.2189
cosine_recall@1 0.1811
cosine_recall@3 0.4851
cosine_recall@5 0.5961
cosine_recall@10 0.6832
cosine_ndcg@10 0.6095
cosine_mrr@10 0.5623
cosine_map@100 0.6014

Information Retrieval

Metric Value
cosine_accuracy@1 0.3818
cosine_accuracy@3 0.4405
cosine_accuracy@5 0.5224
cosine_accuracy@10 0.609
cosine_precision@1 0.3818
cosine_precision@3 0.3745
cosine_precision@5 0.2992
cosine_precision@10 0.1892
cosine_recall@1 0.1343
cosine_recall@3 0.3681
cosine_recall@5 0.4744
cosine_recall@10 0.5899
cosine_ndcg@10 0.495
cosine_mrr@10 0.4345
cosine_map@100 0.4799

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,644 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 9 tokens
    • mean: 56.58 tokens
    • max: 170 tokens
    • min: 7 tokens
    • mean: 57.48 tokens
    • max: 170 tokens
  • Samples:
    anchor positive
    What type of camera recorded the video in question? conference following the defense’s objection to the video’s admission, during which the
    court and the State discussed how to lay a proper foundation for authenticating the video:
    [MR. MOONEY’S COUNSEL]: I mean, there’s no way to know if that
    video’s been altered. It’s somebody else’s Ring camera. These aren’t still
    photographs of what happened.

    THE COURT: Has he watched it?
    City Department of Education, the self-represented plaintiff
    submitted a filing containing hallucinations. No. 24-cv-04232,

    20
    2024 WL 3460049, at *7 (S.D.N.Y. July 18, 2024) (unpublished
    opinion). The court noted that “[s]anctions may be imposed for
    submitting false and nonexistent legal authority to the [c]ourt.” Id.
    However, the court declined to impose sanctions due to the
    For what reason did the court note sanctions could be imposed?
    What can happen to someone unfamiliar with the limitations of generative artificial intelligence tools? Since the use of generative artificial intelligence (GAI) tools has
    become widespread, lawyers and self-represented litigants alike
    have relied on them to draft court filings. Because the most
    commonly used GAI tools were not designed to create legal
    documents, a person unfamiliar with the limitations of GAI tools,
    such as the appellant in this case, can unwittingly produce text
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 4
  • learning_rate: 2e-05
  • lr_scheduler_type: cosine
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • gradient_accumulation_steps: 16
  • bf16: True
  • tf32: True
  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • load_best_model_at_end: True
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 4
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 16
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: 0.1
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.4396 10 7.6935 - - - - -
0.8791 20 3.9596 - - - - -
1.0 23 - 0.6763 0.6624 0.6314 0.5678 0.4394
1.3077 30 2.5402 - - - - -
1.7473 40 2.1285 - - - - -
2.0 46 - 0.7016 0.6879 0.6626 0.5922 0.4883
2.1758 50 1.8468 - - - - -
2.6154 60 1.5615 - - - - -
3.0 69 - 0.7094 0.6954 0.6772 0.6024 0.4968
3.0440 70 1.4709 - - - - -
3.4835 80 1.4857 - - - - -
3.9231 90 1.3707 - - - - -
4.0 92 - 0.7106 0.6959 0.6804 0.6095 0.495
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.2.3
  • Transformers: 5.3.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.13.0
  • Datasets: 4.6.1
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
14
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaa961/modernbert-embed-base-legal-matryoshka-corrected_train_set_double_training_set_2026_03_09

Finetuned
(102)
this model

Papers for aaa961/modernbert-embed-base-legal-matryoshka-corrected_train_set_double_training_set_2026_03_09

Evaluation results