view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 160
view article Article Did GPT 5.2 make a breakthrough discovery in theoretical physics? dlouapre • Feb 19 • 62
view article Article Diffusers welcomes FLUX-2 +6 YiYiXu, dg845, sayakpaul, OzzyGT, dn6, ariG23498, linoyts, multimodalart • Nov 25, 2025 • 191
view article Article Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models nvidia • Oct 20, 2025 • 19
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 776
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 111
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 78
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 258
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 258
view article Article Trace & Evaluate your Agent with Arize Phoenix +1 schavalii, jgilhuly16, m-ric • Feb 28, 2025 • 41
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 81
view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 417
view article Article WWDC 24: Running Mistral 7B with Core ML +2 pcuenq, FL33TW00D-HF, reach-vb, osanseviero • Jul 22, 2024 • 65
view article Article How NuminaMath Won the 1st AIMO Progress Prize +6 yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif • Jul 11, 2024 • 128