UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models Paper โข 2601.01373 โข Published Jan 4 โข 1
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper โข 2604.27393 โข Published 27 days ago โข 76
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper โข 2509.18154 โข Published Sep 16, 2025 โข 61
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper โข 2410.10594 โข Published Oct 14, 2024 โข 29
GUICourse: From General Vision Language Models to Versatile GUI Agents Paper โข 2406.11317 โข Published Jun 17, 2024 โข 2
view post Post 2687 Introducing GUICourse! ๐ By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent. ๐ Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))๐ Github Repo: (https://github.com/yiye3/GUICourse)๐ Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)๐ฏ Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance) 16 replies ยท ๐ฅ 5 5 ๐ 4 4 ๐ 2 2 + Reply