Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 15 items • Updated about 13 hours ago • 14
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 26 days ago • 112
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory. • 29 items • Updated Aug 14, 2025 • 32
VideoChat-R1 Collection VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning • 4 items • Updated Sep 28, 2025 • 8
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10, 2025 • 215
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models Paper • 2504.02882 • Published Apr 2, 2025 • 7
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 203
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published Apr 1, 2025 • 26
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3, 2025 • 57
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1, 2025 • 67
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3, 2025 • 32