ColQwen3.5-4.5B-v3

Visual document retrieval model using ColBERT-style late interaction with Qwen3.5-4B

4.5B parameters | 320-dim embeddings | LoRA (r=16, alpha=64) | BF16

What's New in V3

  • Automated hyperparameter optimization (Optuna, 16 trials across 3 nodes)
  • Optimized LoRA config: r=16, alpha=64 (alpha/r=4.0), cosine scheduler
  • Model soup with V2 via per-layer evolutionary merge optimization
  • Significant V2 benchmark improvement (+0.0219 over V2)

Benchmark Results

ViDoRe V1 (nDCG@5)

Task ColQwen3.5-v3 Nemotron 4B ColQwen3.5-v2 TomoroAI 4B Jina v4
ArxivQA 0.9179 0.9203 0.9155 0.9066 0.8846
DocVQA 0.6658 0.6739 0.6610 0.6624 0.6014
InfoVQA 0.9359 0.9331 0.9356 0.9429 0.9379
ShiftProject 0.9039 0.9226 0.9404 0.8739 0.9293
SynDocQA AI 1.0000 0.9926 1.0000 0.9926 0.9926
SynDocQA Energy 0.9712 0.9619 0.9739 0.9691 0.9726
SynDocQA Gov 0.9729 0.9802 0.9742 0.9717 0.9659
SynDocQA Health 0.9889 0.9852 0.9889 0.9963 0.9913
Tabfquad 0.9599 0.9805 0.9453 0.9433 0.9560
Tatdqa 0.8394 0.8119 0.8377 0.7983 0.8035
Average 0.9156 0.9162 0.9172 0.9057 0.9035

ViDoRe V3 β€” English (nDCG@5)

Task ColQwen3.5-v3 TomoroAI 4B ColQwen3.5-v2 Nemotron 3B Jina v4
ComputerScience 0.7709 0.7419 0.7716 0.7514 0.7175
Energy 0.6330 0.6023 0.6493 0.5838 0.5842
FinanceEn 0.6584 0.6753 0.6584 0.6712 0.6417
FinanceFr 0.4311 0.4202 0.4290 0.3730 0.3859
Hr 0.6402 0.6037 0.6316 0.6256 0.6206
Industrial 0.5780 0.5787 0.5527 0.5447 0.5443
Pharmaceuticals 0.6538 0.6612 0.6506 0.6524 0.6303
Physics 0.4619 0.4640 0.4752 0.4128 0.4191
Average 0.6034 0.5934 0.6023 0.5769 0.5680

ViDoRe V3 β€” Multilingual (nDCG@5)

Task ColQwen3.5-v3 TomoroAI 4B ColQwen3.5-v2 Nemotron 3B Jina v4
ComputerScience 0.7543 0.7419 0.7538 0.7514 0.7175
Energy 0.6692 0.6023 0.6918 0.5838 0.5842
FinanceEn 0.5925 0.6753 0.5923 0.6712 0.6417
FinanceFr 0.4759 0.4202 0.4782 0.3730 0.3859
Hr 0.5971 0.6037 0.5858 0.6256 0.6206
Industrial 0.5268 0.5787 0.5111 0.5447 0.5443
Pharmaceuticals 0.6350 0.6612 0.6324 0.6524 0.6303
Physics 0.4734 0.4640 0.4853 0.4128 0.4191
Average 0.5905 0.5934 0.5913 0.5769 0.5680

ViDoRe V3 β€” English (nDCG@10)

Task ColQwen3.5-v3 Nemotron 4B ColQwen3.5-v2 TomoroAI 4B Jina v4
ComputerScience 0.7978 0.7856 0.7991 0.7544 0.7175
Energy 0.6619 0.6748 0.6821 0.6643 0.5842
FinanceEn 0.6876 0.6502 0.6845 0.6384 0.6417
FinanceFr 0.4712 0.4901 0.4649 0.4683 0.3859
Hr 0.6618 0.6239 0.6558 0.6009 0.6206
Industrial 0.5948 0.5391 0.5779 0.5358 0.5443
Pharmaceuticals 0.6734 0.6610 0.6676 0.6574 0.6303
Physics 0.5013 0.4886 0.5062 0.4932 0.4191
Average 0.6312 0.6142 0.6297 0.6016 0.5680

ViDoRe V3 β€” Multilingual (nDCG@10)

Task ColQwen3.5-v3 Nemotron 4B ColQwen3.5-v2 TomoroAI 4B Jina v4
ComputerScience 0.7866 0.7856 0.7861 0.7544 0.7175
Energy 0.7000 0.6748 0.7146 0.6643 0.5842
FinanceEn 0.6195 0.6502 0.6239 0.6384 0.6417
FinanceFr 0.5131 0.4901 0.5110 0.4683 0.3859
Hr 0.6183 0.6239 0.6087 0.6009 0.6206
Industrial 0.5480 0.5391 0.5293 0.5358 0.5443
Pharmaceuticals 0.6540 0.6610 0.6541 0.6574 0.6303
Physics 0.5047 0.4886 0.5138 0.4932 0.4191
Average 0.6180 0.6142 0.6177 0.6016 0.5680

ViDoRe V2 β€” English (nDCG@5)

Task ColQwen3.5-v3 Nemotron 3B TomoroAI 4B ColQwen3.5-v2 Jina v4
BioMedicalLectures 0.6769 0.6518 0.6718 0.6919 0.6359
ESGReportsHL 0.7332 0.7538 0.7465 0.6957 0.6512
ESGReports 0.5986 0.6030 0.6300 0.6146 0.5194
EconomicsReports 0.6417 0.6619 0.5910 0.6204 0.5955
Average 0.6626 0.6676 0.6598 0.6557 0.6005

ViDoRe V2 β€” Multilingual (nDCG@5)

Task ColQwen3.5-v3 Nemotron 3B TomoroAI 4B ColQwen3.5-v2 Jina v4
BioMedicalLectures 0.6481 0.6518 0.6718 0.6438 0.6359
ESGReports 0.5748 0.6030 0.6300 0.5799 0.5194
EconomicsReports 0.5852 0.6619 0.5910 0.5722 0.5955
Average 0.6027 0.6389 0.6309 0.5986 0.5836

Key Results

Model V1 @5 V3 Eng @5 V3 Eng @10 V3 Multi @5 V3 Multi @10 V2 Eng @5
ColQwen3.5-v3 0.9156 0.6034 0.6312 0.5905 0.6180 0.6626
ColQwen3.5-v2 0.9172 0.6023 0.6297 0.5913 0.6177 0.6557
Nemotron-4B 0.9162 β€” 0.6142 β€” 0.6142 0.6676
TomoroAI 4B 0.9057 0.5934 0.6016 0.5934 0.6016 0.6598
  • V3 English @10: #1 across all 4B models (0.6312, +0.0170 over Nemotron)
  • V3 English @5: #1 across all 4B models (0.6034, +0.0100 over TomoroAI)
  • V2 English: 0.6626, nearly matching Nemotron (0.6676)
  • V1: passes kill criterion (0.9156 > 0.9130)

Limitations

  • Trails TomoroAI on FinanceEn, Industrial, Pharmaceuticals (domain-specific data advantage)
  • Trails Nemotron on V2 English benchmark (0.6626 vs 0.6676)
  • V1 slightly below V2 and Nemotron

Usage

import torch
from PIL import Image
from colpali_engine.models import ColQwen3_5, ColQwen3_5Processor

model = ColQwen3_5.from_pretrained(
    "athrael-soju/colqwen3.5-4.5B-v3",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    attn_implementation="sdpa",
)
processor = ColQwen3_5Processor.from_pretrained("athrael-soju/colqwen3.5-4.5B-v3")

# Embed document images
images = [Image.open("page1.png"), Image.open("page2.png")]
batch = processor.process_images(images).to(model.device)
with torch.no_grad():
    doc_embeddings = model(**batch)

# Embed queries
queries = ["What is the revenue for Q4?", "Show me the organizational chart"]
batch = processor.process_queries(queries).to(model.device)
with torch.no_grad():
    model.rope_deltas = None
    query_embeddings = model(**batch)

# Score with MaxSim
scores = processor.score(query_embeddings, doc_embeddings)

Training

Pipeline

  1. HPO search: 16 Optuna trials (multi-objective V1+V3), found optimal LoRA config
  2. Full training: 3 seeds (42, 123, 456) with HPO-selected hyperparameters, 1 epoch
  3. Seed merge: full state dict averaging (3 seeds into 1)
  4. Model soup: per-layer evolutionary merge with V2 (11 Optuna trials, 14 merge parameters)

Training Data (~776K pairs)

  • vidore/colpali_train_set: 127K
  • openbmb/VisRAG-Ret-Train-Synthetic-data: 239K
  • openbmb/VisRAG-Ret-Train-In-domain-data: 123K
  • llamaindex/vdr-multilingual-train: ~270K (5 languages)
  • vidore/tatdqa_train: ~13K (finance)
  • Metric-AI/tabfquad_train_set: ~1.5K (tables)

Hyperparameters (from HPO)

Parameter Value
LoRA r 16
LoRA alpha 64 (alpha/r = 4.0)
LR 4.57e-5
Scheduler cosine
Dropout 0.197
Warmup 8%
Weight decay 0.02
Batch size 32
Hard negatives 2/sample
Seeds 42, 123, 456

Technical Notes

  • PEFT's add_weighted_adapter is broken for ColQwen3.5 (both DARE-TIES and linear). Use full state dict averaging for seed merging.
  • Model soup done via direct state dict interpolation with per-layer weights optimized by Optuna.
  • B200/Blackwell GPUs require Conv3d to F.linear monkey-patch.
  • Always clear rope_deltas before forward passes with hard negatives.

Transparency

The complete evaluation trail from V1, V2, and V3 development is available at athrael-soju/colqwen-optimization-trail. This includes every intermediate evaluation showing which candidates were tried, what scores they got, and which were selected for publication. All selection decisions were evaluated against the same public ViDoRe benchmarks used for final reporting.

Citation

@misc{colqwen35v3,
  title={ColQwen3.5-v3: Visual Document Retrieval with HPO and Evolutionary Model Soups},
  author={athrael-soju},
  year={2026},
  url={https://huggingface.co/athrael-soju/colqwen3.5-4.5B-v3}
}

License

Apache 2.0

Downloads last month
123
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for athrael-soju/colqwen3.5-4.5B-v3

Finetuned
Qwen/Qwen3.5-4B
Adapter
(47)
this model