SentenceTransformer based on Qwen/Qwen3-Embedding-4B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-4B on the flash_rag_datasets dataset. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-4B
  • Maximum Sequence Length: 40960 tokens
  • Output Dimensionality: 2560 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 40960, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("qwen3-embedding-4b_qwen3-8b_nq_lsr")
# Run inference
queries = [
    "when did captain crunch oops all berries come out",
]
documents = [
    'First released in 1997',
    'Shel Silverstein',
    'Notre Dame',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 2560] [3, 2560]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2967, 0.2414, 0.1715]])

Training Details

Training Dataset

flash_rag_datasets

  • Dataset: flash_rag_datasets at bcafb8d
  • Size: 79,168 training samples
  • Columns: query and response
  • Approximate statistics based on the first 1000 samples:
    query response
    type string string
    details
    • min: 9 tokens
    • mean: 11.35 tokens
    • max: 25 tokens
    • min: 2 tokens
    • mean: 5.31 tokens
    • max: 16 tokens
  • Samples:
    query response
    total number of death row inmates in the us 2,718
    big little lies season 2 how many episodes seven
    who sang waiting for a girl like you Foreigner
  • Loss: fed_rag.loss.pytorch.lsr.LSRLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1
  • gradient_accumulation_steps: 16
  • learning_rate: 1e-05
  • max_steps: 100
  • lr_scheduler_type: constant
  • remove_unused_columns: False
  • dataloader_pin_memory: False
  • push_to_hub: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 100
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0002 1 0.0004
0.0004 2 0.0004
0.0006 3 0.0008
0.0008 4 0.0003
0.0010 5 0.0002
0.0012 6 0.0002
0.0014 7 0.0003
0.0016 8 0.0004
0.0018 9 0.0007
0.0020 10 0.0003
0.0022 11 0.0007
0.0024 12 0.0003
0.0026 13 0.0004
0.0028 14 0.0004
0.0030 15 0.0004
0.0032 16 0.0006
0.0034 17 0.0005
0.0036 18 0.0005
0.0038 19 0.0003
0.0040 20 0.0006
0.0042 21 0.0003
0.0044 22 0.0004
0.0046 23 0.0004
0.0049 24 0.0003
0.0051 25 0.0004
0.0053 26 0.0005
0.0055 27 0.0004
0.0057 28 0.0003
0.0059 29 0.0003
0.0061 30 0.0005
0.0063 31 0.0004
0.0065 32 0.0003
0.0067 33 0.0003
0.0069 34 0.0007
0.0071 35 0.0002
0.0073 36 0.0003
0.0075 37 0.0003
0.0077 38 0.0004
0.0079 39 0.0011
0.0081 40 0.0004
0.0083 41 0.0004
0.0085 42 0.0002
0.0087 43 0.0003
0.0089 44 0.0004
0.0091 45 0.0003
0.0093 46 0.0004
0.0095 47 0.0006
0.0097 48 0.0004
0.0099 49 0.0003
0.0101 50 0.0003
0.0103 51 0.0004
0.0105 52 0.0002
0.0107 53 0.0003
0.0109 54 0.0003
0.0111 55 0.0004
0.0113 56 0.0009
0.0115 57 0.0012
0.0117 58 0.0003
0.0119 59 0.0003
0.0121 60 0.0004
0.0123 61 0.0005
0.0125 62 0.0006
0.0127 63 0.0003
0.0129 64 0.0004
0.0131 65 0.0004
0.0133 66 0.0005
0.0135 67 0.0003
0.0137 68 0.0006
0.0139 69 0.0004
0.0141 70 0.0003
0.0143 71 0.0005
0.0146 72 0.0003
0.0148 73 0.0003
0.0150 74 0.0003
0.0152 75 0.0004
0.0154 76 0.0005
0.0156 77 0.0002
0.0158 78 0.0005
0.0160 79 0.0003
0.0162 80 0.0003
0.0164 81 0.0004
0.0166 82 0.0005
0.0168 83 0.0003
0.0170 84 0.0003
0.0172 85 0.0003
0.0174 86 0.0004
0.0176 87 0.0001
0.0178 88 0.0004
0.0180 89 0.0004
0.0182 90 0.0003
0.0184 91 0.0005
0.0186 92 0.0003
0.0188 93 0.0003
0.0190 94 0.0003
0.0192 95 0.0003
0.0194 96 0.0005
0.0196 97 0.0006
0.0198 98 0.0003
0.0200 99 0.0003
0.0202 100 0.0004

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.2
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zyc-zju/qwen3-embedding-4b_qwen3-8b_nq_lsr

Base model

Qwen/Qwen3-4B-Base
Finetuned
(37)
this model

Dataset used to train zyc-zju/qwen3-embedding-4b_qwen3-8b_nq_lsr

Paper for zyc-zju/qwen3-embedding-4b_qwen3-8b_nq_lsr