SentenceTransformer based on Qwen/Qwen3-Embedding-4B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-4B on the flash_rag_datasets dataset. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-4B
  • Maximum Sequence Length: 40960 tokens
  • Output Dimensionality: 2560 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 40960, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("qwen3-embedding-4b_qwen3-8b_hotpotqa_lsr")
# Run inference
queries = [
    "Huddersfield Giants R.L.F.C. are an English professional rugby league club from Huddersfield, West Yorkshire, the birthplace of rugby league, who play in the Super League competition, they play their home games at the Kirklees Stadium which is shared with Huddersfield Town F.C., is a multi-use sports stadium in Huddersfield in West Yorkshire, in which country?",
]
documents = [
    'England',
    'Alan Menken',
    'lead singer',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 2560] [3, 2560]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2828, 0.1562, 0.1402]])

Training Details

Training Dataset

flash_rag_datasets

  • Dataset: flash_rag_datasets at bcafb8d
  • Size: 90,447 training samples
  • Columns: query and response
  • Approximate statistics based on the first 1000 samples:
    query response
    type string string
    details
    • min: 8 tokens
    • mean: 25.45 tokens
    • max: 147 tokens
    • min: 2 tokens
    • mean: 4.93 tokens
    • max: 73 tokens
  • Samples:
    query response
    Which magazine was started first Arthur's Magazine or First for Women? Arthur's Magazine
    The Oberoi family is part of a hotel company that has a head office in what city? Delhi
    Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who? President Richard Nixon
  • Loss: fed_rag.loss.pytorch.lsr.LSRLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1
  • gradient_accumulation_steps: 16
  • learning_rate: 1e-05
  • max_steps: 100
  • lr_scheduler_type: constant
  • remove_unused_columns: False
  • dataloader_pin_memory: False
  • push_to_hub: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 100
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0002 1 0.0009
0.0004 2 0.0009
0.0005 3 0.0006
0.0007 4 0.0005
0.0009 5 0.001
0.0011 6 0.0004
0.0012 7 0.0008
0.0014 8 0.0009
0.0016 9 0.0011
0.0018 10 0.0014
0.0019 11 0.0012
0.0021 12 0.0015
0.0023 13 0.0009
0.0025 14 0.0005
0.0027 15 0.0005
0.0028 16 0.0013
0.0030 17 0.001
0.0032 18 0.0005
0.0034 19 0.0005
0.0035 20 0.001
0.0037 21 0.0007
0.0039 22 0.0015
0.0041 23 0.0013
0.0042 24 0.0014
0.0044 25 0.0009
0.0046 26 0.0012
0.0048 27 0.0011
0.0050 28 0.0011
0.0051 29 0.0007
0.0053 30 0.0007
0.0055 31 0.0011
0.0057 32 0.0014
0.0058 33 0.0006
0.0060 34 0.0008
0.0062 35 0.0012
0.0064 36 0.0006
0.0065 37 0.0008
0.0067 38 0.0006
0.0069 39 0.0008
0.0071 40 0.0005
0.0073 41 0.0008
0.0074 42 0.0008
0.0076 43 0.0013
0.0078 44 0.0005
0.0080 45 0.0009
0.0081 46 0.0007
0.0083 47 0.001
0.0085 48 0.0009
0.0087 49 0.001
0.0088 50 0.001
0.0090 51 0.0015
0.0092 52 0.0006
0.0094 53 0.0009
0.0096 54 0.0009
0.0097 55 0.0012
0.0099 56 0.0007
0.0101 57 0.0006
0.0103 58 0.0006
0.0104 59 0.0006
0.0106 60 0.0005
0.0108 61 0.0004
0.0110 62 0.0008
0.0111 63 0.001
0.0113 64 0.0012
0.0115 65 0.0011
0.0117 66 0.001
0.0119 67 0.0011
0.0120 68 0.0011
0.0122 69 0.0012
0.0124 70 0.0008
0.0126 71 0.0006
0.0127 72 0.0006
0.0129 73 0.0006
0.0131 74 0.0015
0.0133 75 0.0011
0.0134 76 0.0009
0.0136 77 0.0012
0.0138 78 0.0008
0.0140 79 0.0011
0.0142 80 0.0008
0.0143 81 0.0009
0.0145 82 0.0009
0.0147 83 0.0007
0.0149 84 0.0007
0.0150 85 0.0008
0.0152 86 0.0012
0.0154 87 0.001
0.0156 88 0.0003
0.0157 89 0.0003
0.0159 90 0.001
0.0161 91 0.0008
0.0163 92 0.0006
0.0165 93 0.0009
0.0166 94 0.0012
0.0168 95 0.0012
0.0170 96 0.0007
0.0172 97 0.0007
0.0173 98 0.001
0.0175 99 0.0008
0.0177 100 0.0007

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.2
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zyc-zju/qwen3-embedding-4b_qwen3-8b_hotpotqa_lsr

Base model

Qwen/Qwen3-4B-Base
Finetuned
(37)
this model

Dataset used to train zyc-zju/qwen3-embedding-4b_qwen3-8b_hotpotqa_lsr

Paper for zyc-zju/qwen3-embedding-4b_qwen3-8b_hotpotqa_lsr