thenlper commited on
Commit
2fd60e3
·
verified ·
1 Parent(s): db0a90a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -11
README.md CHANGED
@@ -1,7 +1,15 @@
 
 
 
 
 
 
 
 
1
  # Qwen3-VL-Embedding-8B
2
 
3
  <p align="center">
4
- <img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/qwen3_vl_embedding_logo.png" width="400"/>
5
  <p>
6
 
7
  ## Highlights
@@ -10,13 +18,13 @@ The **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker** model series are the latest
10
 
11
  While the Embedding model generates high-dimensional vectors for broad applications like retrieval and clustering, the Reranker model is engineered to refine these results, establishing a comprehensive pipeline for state-of-the-art multimodal search.
12
 
13
- - **Multimodal Versatility**: Both models seamlessly process inputs containing text, images, screenshots, and video within a unified framework. They achieve state-of-the-art performance across diverse multimodal tasks, including image-text retrieval, video-text matching, visual question answering (VQA), and multimodal content clustering.
14
 
15
  - **Unified Representation Learning (Embedding)**: By leveraging the Qwen3-VL architecture, the Embedding model generates semantically rich vectors that capture both visual and textual information in a shared space. This facilitates efficient similarity computation and retrieval across different modalities.
16
 
17
- - **High-Precision Reranking (Reranker)**: We simultaneously provide the Qwen3-VL-Reranker series to complement the embedding model. The Reranker accepts an input pair (Query, Document)—where both the query and document can consist of arbitrary single or mixed modalities—and outputs a precise relevance score. In retrieval scenarios, the Embedding and Reranker models are typically used in tandem: the embedding model handles the initial recall stage, while the reranker manages the re-ranking stage. This two-step process significantly enhances the final retrieval accuracy.
18
 
19
- - **Exceptional Practicality**: Inheriting Qwen3-VL's multilingual capabilities, the series supports over **30** languages, making it ideal for global applications. It is highly practical for real-world scenarios, offering flexible vector dimensions, customizable instructions for specific use cases, and strong performance even with quantized models. These features allow developers to easily integrate both models into existing pipelines for applications requiring robust cross-lingual and cross-modal understanding.
20
 
21
  ## Model Overview
22
 
@@ -29,16 +37,16 @@ While the Embedding model generates high-dimensional vectors for broad applicati
29
  - Context Length: 32k
30
  - Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 64 to 4096
31
 
32
- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-embedding/), [GitHub](https://github.com/QwenLM/Qwen3-VL-Embedding).
33
 
34
  ## Qwen3-VL-Embedding and Qwen3-VL-Reranker Model list
35
 
36
  | Model | Size | Model Layers | Sequence Length | Embedding Dimension | Quantization Support | MRL Support | Instruction Aware |
37
  |---|---|---|---|---|----------------------|---|---|
38
- | [Qwen3-VL-Embedding-2B] | 2B | 28 | 32K | 2048 | Yes | Yes | Yes |
39
- | [Qwen3-VL-Embedding-8B] | 8B | 36 | 32K | 4096 | Yes | Yes | Yes |
40
- | [Qwen3-VL-Reranker-2B] | 2B | 28 | 32K | - | - | - | Yes |
41
- | [Qwen3-VL-Reranker-8B] | 8B | 36 | 32K | - | - | - | Yes |
42
 
43
  > **Note**:
44
  > - `Quantization Support` indicates the supported quantization post process for the output embedding.
@@ -63,8 +71,8 @@ Results on the MMEB-V2 benchmark. All models except IFM-TTE have been re-evaluat
63
  | IFM-TTE | 8B | 76.7 | 78.5 | 74.6 | 89.3 | 77.9 | 60.5 | 67.9 | 51.7 | 54.9 | 59.2 | 85.2 | 71.5 | 92.7 | 53.3 | 79.5 | 74.1 |
64
  | RzenEmbed | 8B | 70.6 | 71.7 | 78.5 | 92.1 | 75.9 | 58.8 | 63.5 | 51.0 | 45.5 | 55.7 | 89.7 | 60.7 | 88.7 | 69.9 | 81.3 | 72.9 |
65
  | Seed-1.6-embedding-1215 | unknown | 75.0 | 74.9 | 79.3 | 89.0 | 78.0 | 85.2 | 66.7 | 59.1 | 54.8 | 67.7 | 90.0 | 60.3 | 90.0 | 70.7 | 82.2 | 76.9 |
66
- | **Qwen3-VL-Embedding-2B** | 2B | 70.2 | 74.4 | 74.9 | 88.6 | 75.0 | 72.8 | 63.8 | 52.3 | 51.6 | 61.1 | 85.2 | 66.0 | 86.3 | 74.3 | 80.2 | 73.4 |
67
- | **Qwen3-VL-Embedding-8B** | 8B | 74.4 | 81.0 | 80.0 | 92.2 | 80.1 | 79.1 | 70.1 | 57.0 | 53.2 | 66.1 | 88.2 | 69.9 | 88.8 | 78.3 | 83.3 | **77.9** |
68
 
69
  ### Evaluation Results on [MMTEB](https://huggingface.co/spaces/mteb/leaderboard)
70
 
@@ -142,3 +150,15 @@ print(similarity_scores.tolist())
142
 
143
  For more usage examples, please visit our [GitHub repository](https://github.com/QwenLM/Qwen3-VL-Embedding).
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3-VL-8B-Instruct
5
+ tags:
6
+ - transformers
7
+ - multimodal embedding
8
+ ---
9
  # Qwen3-VL-Embedding-8B
10
 
11
  <p align="center">
12
+ <img src="https://docs.qwenlm.ai/resources/ed7dd9fa-7afd-4364-902a-ee04eafd89cc.png" width="400"/>
13
  <p>
14
 
15
  ## Highlights
 
18
 
19
  While the Embedding model generates high-dimensional vectors for broad applications like retrieval and clustering, the Reranker model is engineered to refine these results, establishing a comprehensive pipeline for state-of-the-art multimodal search.
20
 
21
+ - **Multimodal Versatility**: Both models seamlessly handle a wide range of inputs—including text, images, screenshots, and videowithin a unified framework. They deliver state-of-the-art performance across diverse multimodal tasks such as image-text retrieval, video-text matching, visual question answering (VQA), and multimodal content clustering.
22
 
23
  - **Unified Representation Learning (Embedding)**: By leveraging the Qwen3-VL architecture, the Embedding model generates semantically rich vectors that capture both visual and textual information in a shared space. This facilitates efficient similarity computation and retrieval across different modalities.
24
 
25
+ - **High-Precision Reranking (Reranker)**: We also introduce the Qwen3-VL-Reranker series to complement the embedding model. The reranker takes a (query, document) pair as input—where both query and document may contain arbitrary single or mixed modalities—and outputs a precise relevance score. In retrieval pipelines, the two models are typically used in tandem: the embedding model performs efficient initial recall, while the reranker refines results in a subsequent re-ranking stage. This two-stage approach significantly boosts retrieval accuracy.
26
 
27
+ - **Exceptional Practicality**: Inheriting Qwen3-VLs multilingual capabilities, the series supports over 30 languages, making it ideal for global applications. It is highly practical for real-world scenarios, offering flexible vector dimensions, customizable instructions for specific use cases, and strong performance even with quantized embeddings. These capabilities enable developers to seamlessly integrate both models into existing pipelines, unlocking powerful cross-lingual and cross-modal understanding.
28
 
29
  ## Model Overview
30
 
 
37
  - Context Length: 32k
38
  - Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 64 to 4096
39
 
40
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwen.ai/blog?id=qwen3-vl-embedding), [GitHub](https://github.com/QwenLM/Qwen3-VL-Embedding).
41
 
42
  ## Qwen3-VL-Embedding and Qwen3-VL-Reranker Model list
43
 
44
  | Model | Size | Model Layers | Sequence Length | Embedding Dimension | Quantization Support | MRL Support | Instruction Aware |
45
  |---|---|---|---|---|----------------------|---|---|
46
+ | [Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B) | 2B | 28 | 32K | 2048 | Yes | Yes | Yes |
47
+ | [Qwen3-VL-Embedding-8B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-8B) | 8B | 36 | 32K | 4096 | Yes | Yes | Yes |
48
+ | [Qwen3-VL-Reranker-2B](https://huggingface.co/Qwen/Qwen3-VL-Reranker-2B) | 2B | 28 | 32K | - | - | - | Yes |
49
+ | [Qwen3-VL-Reranker-8B](https://huggingface.co/Qwen/Qwen3-VL-Reranker-8B) | 8B | 36 | 32K | - | - | - | Yes |
50
 
51
  > **Note**:
52
  > - `Quantization Support` indicates the supported quantization post process for the output embedding.
 
71
  | IFM-TTE | 8B | 76.7 | 78.5 | 74.6 | 89.3 | 77.9 | 60.5 | 67.9 | 51.7 | 54.9 | 59.2 | 85.2 | 71.5 | 92.7 | 53.3 | 79.5 | 74.1 |
72
  | RzenEmbed | 8B | 70.6 | 71.7 | 78.5 | 92.1 | 75.9 | 58.8 | 63.5 | 51.0 | 45.5 | 55.7 | 89.7 | 60.7 | 88.7 | 69.9 | 81.3 | 72.9 |
73
  | Seed-1.6-embedding-1215 | unknown | 75.0 | 74.9 | 79.3 | 89.0 | 78.0 | 85.2 | 66.7 | 59.1 | 54.8 | 67.7 | 90.0 | 60.3 | 90.0 | 70.7 | 82.2 | 76.9 |
74
+ | **Qwen3-VL-Embedding-2B** | 2B | 70.3 | 74.3 | 74.8 | 88.5 | 75.0 | 71.9 | 64.9 | 53.9 | 53.3 | 61.9 | 84.4 | 65.3 | 86.4 | 69.4 | 79.2 | 73.2 |
75
+ | **Qwen3-VL-Embedding-8B** | 8B | 74.2 | 81.1 | 80.2 | 92.3 | 80.1 | 78.4 | 71.0 | 58.7 | 56.1 | 67.1 | 87.2 | 69.9 | 88.7 | 73.3 | 82.4 | **77.8** |
76
 
77
  ### Evaluation Results on [MMTEB](https://huggingface.co/spaces/mteb/leaderboard)
78
 
 
150
 
151
  For more usage examples, please visit our [GitHub repository](https://github.com/QwenLM/Qwen3-VL-Embedding).
152
 
153
+ ## Citation
154
+
155
+ If you find our work helpful, feel free to give us a cite.
156
+
157
+ ```
158
+ @article{qwen3vlembedding,
159
+ title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
160
+ author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
161
+ journal={arXiv},
162
+ year={2026}
163
+ }
164
+ ```