--- license: mit language: - en base_model: - Qwen/Qwen2.5-0.5B --- # Model Card for MergeVLA-LIBERO MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA. ## Model Details Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of: - Qwen2.5-0.5B as the Vision-Language Model (VLM) - A lightweight 0.18B Action Expert - A two-layer Proprioceptive Projector MLP ### ✔️ **Performance (Success Rates on LIBERO)** | Task Family | Success Rate (%) | | ----------- | ---------------- | | **Spatial** | **98.0** | | **Object** | **98.6** | | **Goal** | **95.0** | | **Long-10** | **95.0** | ### 🧠 **Training Details** Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format. | Category | Value | | ----------------------- | ------------------------ | | LoRA | Enabled (rank = 64) | | Optimizer | AdamW | | Learning Rate | 2e-4 | | Batch Size | 8 (×2 grad accumulation) | | num_images_in_input | 2 | ### **Training Steps** * **Spatial** — 30,000 * **Object** — 20,000 * **Goal** — 30,000 * **Long-10** — 50,000 ## Citation instructions ```BibTeX @misc{fu2025mergevla, title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent}, author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo}, year={2025}, eprint={2511.18810}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2511.18810}, } ```