OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization Paper • 2605.21226 • Published 3 days ago • 8
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Paper • 2605.20315 • Published 4 days ago • 26
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 4 days ago • 38
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation Paper • 2605.19833 • Published 4 days ago • 124
mlx-community/gemma-4-E4B-it-OBLITERATED-mlx-8Bit Text Generation • 8B • Updated 3 days ago • 6.11k • 2