EgoNormia-Cosmos-Reason2-2B-v6b-shortcot
Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v6b run extends the MCQ setup with generation tasks: 3 MCQ tasks plus action / justification / sensibility generation, trained with short CoT traces.
Training
| Parameter | Value |
|---|---|
| Base model | nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B) |
| Tasks | 6 tasks = 3 MCQ + action_gen + justification_gen + sensibility_gen |
| Train samples | 9780 |
| Training file | data/egonormia_llava_v6b_train.json |
| CoT style | Short CoT in <think> blocks |
| CoT length | mean 30.9 words, median 30 |
| Epochs | 3 |
| Global batch | 64 (8 replicas x 8 per replica) |
| Learning rate | 1e-5 (cosine decay, 3% warmup) |
| Context length | 8192 |
| Video input | 8 frames |
| Hardware | 8x A100-SXM4-80GB |
| Run dir | outputs/egonormia_sft_v6b/20260302131747/ |
| Uploaded checkpoint | step_160 / 456 total steps |
MCQ Evaluation (200 verified test samples)
No-think
| Checkpoint | Action | Justification | Both | S-IoU | Parse |
|---|---|---|---|---|---|
v6b step_160 |
78.5% | 88.5% | 70.5% | 0.6450 | 100.0% |
Think mode
| Checkpoint | Action | Justification | Both | S-IoU | Parse |
|---|---|---|---|---|---|
v6b step_160 + think |
81.5% | 95.0% | 77.5% | 0.6292 | 100.0% |
v6b step_320 + think |
80.5% | 96.0% | 77.5% | 0.6254 | 100.0% |
Open-ended generation evaluation
Best checkpoint step_160 was also evaluated on 50 open-ended test samples with greedy decoding.
| Comparison | v3 avg | v6b avg | v3 wins | v6b wins | Ties |
|---|---|---|---|---|---|
| v3 vs v6b | 2.99 | 3.38 | 18 (38%) | 29 (60%) | 1 |
Notes
- The repo name keeps the historical
shortcotsuffix, but this model is actually the 6-task v6b generation variant. - In no-think mode, v6b does not clear the 77% MCQ gate, but think mode does: both
step_160andstep_320reach 77.5% both accuracy. - Compared with v6, adding the sixth generation task stabilizes parsing instead of hurting it: parse stays at 100% on nearly all checkpoints.
- v6b is mainly interesting when you care about open-ended generation quality or think-mode MCQ performance.
Usage
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model = Qwen3VLForConditionalGeneration.from_pretrained(
"robertzty/EgoNormia-Cosmos-Reason2-2B-v6b-shortcot",
torch_dtype="bfloat16",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v6b-shortcot")
- Downloads last month
- 116
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support