✅ 295B total / 21B active / 256K context ✅ Fused fast-and-slow thinking in a single model ✅ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb → Apr)
Benchmarks: 👉 SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch — competitive results, particularly strong on agentic tool use 👉 Top score on Tsinghua's 2026 Spring math PhD qualifying exam 👉 Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
HY-World-2.0 — A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.
Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion — Zero Training
We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis — with zero training, zero data, and zero GPU hours.
Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture — same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks — producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal.
To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU.
The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns — particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03).
creators of the Darwin Evolutionary Merge Framework. Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3) through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.
We added multi-simulator support for our parallel gripper on the SO-ARM 100/101 arm. One URDF, one launch file, five physics engines: Gazebo, MuJoCo, Webots, CoppeliaSim, Isaac Sim.
The ROS2 package includes xacro descriptions, joint trajectory controllers, robot state publisher, and parameterized hardware interfaces for each simulator.
Now the demo for image detection based on SAM3 and Gemma-4 (*Filter) is available on Spaces, using full-fledged Transformers inference with multimodal reasoning for processed images. It also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.