ColQwen3.5-4.5B-v3
Visual document retrieval model using ColBERT-style late interaction with Qwen3.5-4B
4.5B parameters | 320-dim embeddings | LoRA (r=16, alpha=64) | BF16
What's New in V3
- Automated hyperparameter optimization (Optuna, 16 trials across 3 nodes)
- Optimized LoRA config: r=16, alpha=64 (alpha/r=4.0), cosine scheduler
- Model soup with V2 via per-layer evolutionary merge optimization
- Significant V2 benchmark improvement (+0.0219 over V2)
Benchmark Results
ViDoRe V1 (nDCG@5)
| Task |
ColQwen3.5-v3 |
Nemotron 4B |
ColQwen3.5-v2 |
TomoroAI 4B |
Jina v4 |
| ArxivQA |
0.9179 |
0.9203 |
0.9155 |
0.9066 |
0.8846 |
| DocVQA |
0.6658 |
0.6739 |
0.6610 |
0.6624 |
0.6014 |
| InfoVQA |
0.9359 |
0.9331 |
0.9356 |
0.9429 |
0.9379 |
| ShiftProject |
0.9039 |
0.9226 |
0.9404 |
0.8739 |
0.9293 |
| SynDocQA AI |
1.0000 |
0.9926 |
1.0000 |
0.9926 |
0.9926 |
| SynDocQA Energy |
0.9712 |
0.9619 |
0.9739 |
0.9691 |
0.9726 |
| SynDocQA Gov |
0.9729 |
0.9802 |
0.9742 |
0.9717 |
0.9659 |
| SynDocQA Health |
0.9889 |
0.9852 |
0.9889 |
0.9963 |
0.9913 |
| Tabfquad |
0.9599 |
0.9805 |
0.9453 |
0.9433 |
0.9560 |
| Tatdqa |
0.8394 |
0.8119 |
0.8377 |
0.7983 |
0.8035 |
| Average |
0.9156 |
0.9162 |
0.9172 |
0.9057 |
0.9035 |
ViDoRe V3 β English (nDCG@5)
| Task |
ColQwen3.5-v3 |
TomoroAI 4B |
ColQwen3.5-v2 |
Nemotron 3B |
Jina v4 |
| ComputerScience |
0.7709 |
0.7419 |
0.7716 |
0.7514 |
0.7175 |
| Energy |
0.6330 |
0.6023 |
0.6493 |
0.5838 |
0.5842 |
| FinanceEn |
0.6584 |
0.6753 |
0.6584 |
0.6712 |
0.6417 |
| FinanceFr |
0.4311 |
0.4202 |
0.4290 |
0.3730 |
0.3859 |
| Hr |
0.6402 |
0.6037 |
0.6316 |
0.6256 |
0.6206 |
| Industrial |
0.5780 |
0.5787 |
0.5527 |
0.5447 |
0.5443 |
| Pharmaceuticals |
0.6538 |
0.6612 |
0.6506 |
0.6524 |
0.6303 |
| Physics |
0.4619 |
0.4640 |
0.4752 |
0.4128 |
0.4191 |
| Average |
0.6034 |
0.5934 |
0.6023 |
0.5769 |
0.5680 |
ViDoRe V3 β Multilingual (nDCG@5)
| Task |
ColQwen3.5-v3 |
TomoroAI 4B |
ColQwen3.5-v2 |
Nemotron 3B |
Jina v4 |
| ComputerScience |
0.7543 |
0.7419 |
0.7538 |
0.7514 |
0.7175 |
| Energy |
0.6692 |
0.6023 |
0.6918 |
0.5838 |
0.5842 |
| FinanceEn |
0.5925 |
0.6753 |
0.5923 |
0.6712 |
0.6417 |
| FinanceFr |
0.4759 |
0.4202 |
0.4782 |
0.3730 |
0.3859 |
| Hr |
0.5971 |
0.6037 |
0.5858 |
0.6256 |
0.6206 |
| Industrial |
0.5268 |
0.5787 |
0.5111 |
0.5447 |
0.5443 |
| Pharmaceuticals |
0.6350 |
0.6612 |
0.6324 |
0.6524 |
0.6303 |
| Physics |
0.4734 |
0.4640 |
0.4853 |
0.4128 |
0.4191 |
| Average |
0.5905 |
0.5934 |
0.5913 |
0.5769 |
0.5680 |
ViDoRe V3 β English (nDCG@10)
| Task |
ColQwen3.5-v3 |
Nemotron 4B |
ColQwen3.5-v2 |
TomoroAI 4B |
Jina v4 |
| ComputerScience |
0.7978 |
0.7856 |
0.7991 |
0.7544 |
0.7175 |
| Energy |
0.6619 |
0.6748 |
0.6821 |
0.6643 |
0.5842 |
| FinanceEn |
0.6876 |
0.6502 |
0.6845 |
0.6384 |
0.6417 |
| FinanceFr |
0.4712 |
0.4901 |
0.4649 |
0.4683 |
0.3859 |
| Hr |
0.6618 |
0.6239 |
0.6558 |
0.6009 |
0.6206 |
| Industrial |
0.5948 |
0.5391 |
0.5779 |
0.5358 |
0.5443 |
| Pharmaceuticals |
0.6734 |
0.6610 |
0.6676 |
0.6574 |
0.6303 |
| Physics |
0.5013 |
0.4886 |
0.5062 |
0.4932 |
0.4191 |
| Average |
0.6312 |
0.6142 |
0.6297 |
0.6016 |
0.5680 |
ViDoRe V3 β Multilingual (nDCG@10)
| Task |
ColQwen3.5-v3 |
Nemotron 4B |
ColQwen3.5-v2 |
TomoroAI 4B |
Jina v4 |
| ComputerScience |
0.7866 |
0.7856 |
0.7861 |
0.7544 |
0.7175 |
| Energy |
0.7000 |
0.6748 |
0.7146 |
0.6643 |
0.5842 |
| FinanceEn |
0.6195 |
0.6502 |
0.6239 |
0.6384 |
0.6417 |
| FinanceFr |
0.5131 |
0.4901 |
0.5110 |
0.4683 |
0.3859 |
| Hr |
0.6183 |
0.6239 |
0.6087 |
0.6009 |
0.6206 |
| Industrial |
0.5480 |
0.5391 |
0.5293 |
0.5358 |
0.5443 |
| Pharmaceuticals |
0.6540 |
0.6610 |
0.6541 |
0.6574 |
0.6303 |
| Physics |
0.5047 |
0.4886 |
0.5138 |
0.4932 |
0.4191 |
| Average |
0.6180 |
0.6142 |
0.6177 |
0.6016 |
0.5680 |
ViDoRe V2 β English (nDCG@5)
| Task |
ColQwen3.5-v3 |
Nemotron 3B |
TomoroAI 4B |
ColQwen3.5-v2 |
Jina v4 |
| BioMedicalLectures |
0.6769 |
0.6518 |
0.6718 |
0.6919 |
0.6359 |
| ESGReportsHL |
0.7332 |
0.7538 |
0.7465 |
0.6957 |
0.6512 |
| ESGReports |
0.5986 |
0.6030 |
0.6300 |
0.6146 |
0.5194 |
| EconomicsReports |
0.6417 |
0.6619 |
0.5910 |
0.6204 |
0.5955 |
| Average |
0.6626 |
0.6676 |
0.6598 |
0.6557 |
0.6005 |
ViDoRe V2 β Multilingual (nDCG@5)
| Task |
ColQwen3.5-v3 |
Nemotron 3B |
TomoroAI 4B |
ColQwen3.5-v2 |
Jina v4 |
| BioMedicalLectures |
0.6481 |
0.6518 |
0.6718 |
0.6438 |
0.6359 |
| ESGReports |
0.5748 |
0.6030 |
0.6300 |
0.5799 |
0.5194 |
| EconomicsReports |
0.5852 |
0.6619 |
0.5910 |
0.5722 |
0.5955 |
| Average |
0.6027 |
0.6389 |
0.6309 |
0.5986 |
0.5836 |
Key Results
| Model |
V1 @5 |
V3 Eng @5 |
V3 Eng @10 |
V3 Multi @5 |
V3 Multi @10 |
V2 Eng @5 |
| ColQwen3.5-v3 |
0.9156 |
0.6034 |
0.6312 |
0.5905 |
0.6180 |
0.6626 |
| ColQwen3.5-v2 |
0.9172 |
0.6023 |
0.6297 |
0.5913 |
0.6177 |
0.6557 |
| Nemotron-4B |
0.9162 |
β |
0.6142 |
β |
0.6142 |
0.6676 |
| TomoroAI 4B |
0.9057 |
0.5934 |
0.6016 |
0.5934 |
0.6016 |
0.6598 |
- V3 English @10: #1 across all 4B models (0.6312, +0.0170 over Nemotron)
- V3 English @5: #1 across all 4B models (0.6034, +0.0100 over TomoroAI)
- V2 English: 0.6626, nearly matching Nemotron (0.6676)
- V1: passes kill criterion (0.9156 > 0.9130)
Limitations
- Trails TomoroAI on FinanceEn, Industrial, Pharmaceuticals (domain-specific data advantage)
- Trails Nemotron on V2 English benchmark (0.6626 vs 0.6676)
- V1 slightly below V2 and Nemotron
Usage
import torch
from PIL import Image
from colpali_engine.models import ColQwen3_5, ColQwen3_5Processor
model = ColQwen3_5.from_pretrained(
"athrael-soju/colqwen3.5-4.5B-v3",
torch_dtype=torch.bfloat16,
device_map="cuda",
attn_implementation="sdpa",
)
processor = ColQwen3_5Processor.from_pretrained("athrael-soju/colqwen3.5-4.5B-v3")
images = [Image.open("page1.png"), Image.open("page2.png")]
batch = processor.process_images(images).to(model.device)
with torch.no_grad():
doc_embeddings = model(**batch)
queries = ["What is the revenue for Q4?", "Show me the organizational chart"]
batch = processor.process_queries(queries).to(model.device)
with torch.no_grad():
model.rope_deltas = None
query_embeddings = model(**batch)
scores = processor.score(query_embeddings, doc_embeddings)
Training
Pipeline
- HPO search: 16 Optuna trials (multi-objective V1+V3), found optimal LoRA config
- Full training: 3 seeds (42, 123, 456) with HPO-selected hyperparameters, 1 epoch
- Seed merge: full state dict averaging (3 seeds into 1)
- Model soup: per-layer evolutionary merge with V2 (11 Optuna trials, 14 merge parameters)
Training Data (~776K pairs)
- vidore/colpali_train_set: 127K
- openbmb/VisRAG-Ret-Train-Synthetic-data: 239K
- openbmb/VisRAG-Ret-Train-In-domain-data: 123K
- llamaindex/vdr-multilingual-train: ~270K (5 languages)
- vidore/tatdqa_train: ~13K (finance)
- Metric-AI/tabfquad_train_set: ~1.5K (tables)
Hyperparameters (from HPO)
| Parameter |
Value |
| LoRA r |
16 |
| LoRA alpha |
64 (alpha/r = 4.0) |
| LR |
4.57e-5 |
| Scheduler |
cosine |
| Dropout |
0.197 |
| Warmup |
8% |
| Weight decay |
0.02 |
| Batch size |
32 |
| Hard negatives |
2/sample |
| Seeds |
42, 123, 456 |
Technical Notes
- PEFT's
add_weighted_adapter is broken for ColQwen3.5 (both DARE-TIES and linear). Use full state dict averaging for seed merging.
- Model soup done via direct state dict interpolation with per-layer weights optimized by Optuna.
- B200/Blackwell GPUs require Conv3d to F.linear monkey-patch.
- Always clear
rope_deltas before forward passes with hard negatives.
Transparency
The complete evaluation trail from V1, V2, and V3 development is available at athrael-soju/colqwen-optimization-trail. This includes every intermediate evaluation showing which candidates were tried, what scores they got, and which were selected for publication. All selection decisions were evaluated against the same public ViDoRe benchmarks used for final reporting.
Citation
@misc{colqwen35v3,
title={ColQwen3.5-v3: Visual Document Retrieval with HPO and Evolutionary Model Soups},
author={athrael-soju},
year={2026},
url={https://huggingface.co/athrael-soju/colqwen3.5-4.5B-v3}
}
License
Apache 2.0