tCHu Qwen3-VL-2B LoRA Adapter
A LoRA adapter for Qwen3-VL-2B-Instruct finetuned to play tCHu (Swiss Ticket to Ride) from game screenshots.
Model Description
This adapter was trained via supervised finetuning (SFT) to distill the knowledge of an MLP policy network into a vision-language model. The VLM sees a 640x410 screenshot of the game board and outputs a tool-call action.
Training Details
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-VL-2B-Instruct |
| Method | LoRA SFT (r=16, alpha=32, dropout=0.05) |
| Quantization | 4-bit NF4 (bitsandbytes) |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | 17.4M / 2.1B (0.8%) |
| Training data | 3,864 decision points from 50 self-play games |
| Image resolution | 640x410 |
| Epochs | 1 |
| Final loss | 0.627 |
| Training regime | bf16 mixed precision |
| GPU | NVIDIA RTX 4060 (8GB) |
| Training time | ~90 minutes |
| PEFT version | 0.17.1 |
Teacher Model
The training data was generated by a TicketHybridPlayer β an MLP policy network (BC Round 4, 512 hidden units) with ticket-aware logit boosting (boost_weight=120). Both players in the self-play games used the same model, averaging 72.4 points per game.
Action Format
The model outputs MCP tool-call strings:
draw_card(slot=-1)β draw from deckdraw_card(slot=2)β draw face-up card at position 2claim_route(route_id="BER_LUC_1")β claim the Berne-Lucerne routedraw_tickets()β draw new destination tickets
Usage
With tCHuPy (game GUI)
git clone https://github.com/DavidLacour/tCHuPy
cd tCHuPy
# Download this adapter
python -c "from huggingface_hub import snapshot_download; snapshot_download('DavidLacour/tchu-qwen3-vl-2b-lora', local_dir='models/distilled_bc4_vlm/checkpoint-distilled')"
# Play against the VLM
pip install pygame torch transformers peft bitsandbytes accelerate
python -m tchu.gui --opponent=qwen
Standalone Loading
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig
from peft import PeftModel
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
base_model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen3-VL-2B-Instruct",
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, "DavidLacour/tchu-qwen3-vl-2b-lora")
processor = AutoProcessor.from_pretrained("DavidLacour/tchu-qwen3-vl-2b-lora")
Limitations
- Trained on only 3,864 samples (1 epoch) β limited generalization
- Falls back to a heuristic strategy when the VLM output is unparseable or illegal
- Requires ~4GB VRAM with 4-bit quantization
- Action distribution in training data is skewed (80% draw, 20% claim, 0% tickets)
About tCHu
tCHu is the Swiss version of Ticket to Ride β a two-player railway board game set on a map of Switzerland with 51 stations, 88 routes, and 46 destination tickets. See the tCHuPy repository for the full game implementation.
- Downloads last month
- 16
Model tree for DavidLacour/tchu-qwen3-vl-2b-lora
Base model
Qwen/Qwen3-VL-2B-Instruct