What happens when you make an LLM drive a car where physics are real and actions can't be undone?
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
@CohereLabs just released πΏ Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages π! But thereβs a catch:
Tiny Aya is just a language model. It doesnβt support tool calling, the key capability that turns frontier models into powerful *agents*. So the real question is:
How hard is it to turn Tiny Aya into an agent?
Turns outβ¦ itβs simple, thanks to Hugging Face TRL. Weβre sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.
if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.
It includes GDPO, the latest variant of GRPO for multi-reward RL β¨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence β developed by @sliuau@SimonX et al.
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!
β οΈ LLMs struggle with long prompts β attention overload & lost info π RLMs inspect, split & call themselves on chunks, then aggregate results β Handles millions of tokens, reduces noise, improves reasoning π‘ System prompt guides recursion π― RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)
We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!
1οΈβ£ Q1 β Learning to Reason Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.
Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)
2οΈβ£ Q2 β Multimodality and Coding More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.
Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4
3οΈβ£ Q3 β "Gold" rush, OpenAI opens up, the community goes bananas Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.
Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5
4οΈβ£ Q4 β Mistral returns, leaderboard hill-climbing Mistral is back with updated model families. All labs release impressive models to wrap up the year!
Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 π€―