SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B on the flash_rag_datasets dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Maximum Sequence Length: 32768 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("qwen3-embedding-0.6b_search-r1_2wiki_lsr")
# Run inference
queries = [
    "Do both films Country (film) and Raid in St. Pauli have the directors that share the same nationality?",
]
documents = [
    'no',
    '13 October 1952',
    'yes',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2584, 0.2811, 0.2387]])

Training Details

Training Dataset

flash_rag_datasets

  • Dataset: flash_rag_datasets at bcafb8d
  • Size: 15,000 training samples
  • Columns: query and response
  • Approximate statistics based on the first 1000 samples:
    query response
    type string string
    details
    • min: 9 tokens
    • mean: 18.86 tokens
    • max: 38 tokens
    • min: 2 tokens
    • mean: 4.44 tokens
    • max: 15 tokens
  • Samples:
    query response
    Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country? no
    Do both films The Falcon (Film) and Valentin The Good have the directors from the same country? no
    Which film whose director is younger, Charge It To Me or Danger: Diabolik? Danger: Diabolik
  • Loss: fed_rag.loss.pytorch.lsr.LSRLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1
  • gradient_accumulation_steps: 16
  • learning_rate: 1e-05
  • max_steps: 100
  • lr_scheduler_type: constant
  • remove_unused_columns: False
  • dataloader_pin_memory: False
  • push_to_hub: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 1
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 100
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0011 1 0.0005
0.0021 2 0.0006
0.0032 3 0.0004
0.0043 4 0.0003
0.0053 5 0.0008
0.0064 6 0.0003
0.0075 7 0.0005
0.0085 8 0.0005
0.0096 9 0.0002
0.0107 10 0.0005
0.0117 11 0.0005
0.0128 12 0.0004
0.0139 13 0.0005
0.0149 14 0.0004
0.016 15 0.0002
0.0171 16 0.0005
0.0181 17 0.0006
0.0192 18 0.0004
0.0203 19 0.0004
0.0213 20 0.0002
0.0224 21 0.0003
0.0235 22 0.0004
0.0245 23 0.0006
0.0256 24 0.0004
0.0267 25 0.0003
0.0277 26 0.0006
0.0288 27 0.0003
0.0299 28 0.0006
0.0309 29 0.0006
0.032 30 0.0005
0.0331 31 0.0003
0.0341 32 0.0003
0.0352 33 0.0003
0.0363 34 0.0005
0.0373 35 0.0004
0.0384 36 0.0004
0.0395 37 0.0004
0.0405 38 0.0007
0.0416 39 0.0003
0.0427 40 0.0003
0.0437 41 0.0002
0.0448 42 0.0004
0.0459 43 0.0006
0.0469 44 0.0005
0.048 45 0.0003
0.0491 46 0.0006
0.0501 47 0.0005
0.0512 48 0.0004
0.0523 49 0.0007
0.0533 50 0.0006
0.0544 51 0.0005
0.0555 52 0.0004
0.0565 53 0.0004
0.0576 54 0.0006
0.0587 55 0.0005
0.0597 56 0.0003
0.0608 57 0.0003
0.0619 58 0.0004
0.0629 59 0.0004
0.064 60 0.0007
0.0651 61 0.0007
0.0661 62 0.0004
0.0672 63 0.0004
0.0683 64 0.0005
0.0693 65 0.0004
0.0704 66 0.0003
0.0715 67 0.0007
0.0725 68 0.0003
0.0736 69 0.0005
0.0747 70 0.0005
0.0757 71 0.0004
0.0768 72 0.0004
0.0779 73 0.0003
0.0789 74 0.0003
0.08 75 0.0007
0.0811 76 0.0007
0.0821 77 0.0006
0.0832 78 0.0006
0.0843 79 0.0002
0.0853 80 0.0004
0.0864 81 0.0008
0.0875 82 0.0005
0.0885 83 0.0005
0.0896 84 0.0004
0.0907 85 0.0004
0.0917 86 0.0006
0.0928 87 0.0007
0.0939 88 0.0006
0.0949 89 0.0004
0.096 90 0.0004
0.0971 91 0.0004
0.0981 92 0.0005
0.0992 93 0.0006
0.1003 94 0.0007
0.1013 95 0.0004
0.1024 96 0.0004
0.1035 97 0.0005
0.1045 98 0.0005
0.1056 99 0.0003
0.1067 100 0.0004

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.2
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zyc-zju/qwen3-embedding-0.6b_search-r1_2wiki_lsr

Finetuned
(140)
this model

Dataset used to train zyc-zju/qwen3-embedding-0.6b_search-r1_2wiki_lsr

Paper for zyc-zju/qwen3-embedding-0.6b_search-r1_2wiki_lsr