Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 12 days ago • 39
is-sft-271828/synlogic_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_8 4B • Updated 17 days ago • 4
is-sft-271828/synlogic_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_8 4B • Updated 17 days ago • 4
is-sft-271828/synlogic_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_4 4B • Updated 17 days ago • 3
is-sft-271828/synlogic_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_4 4B • Updated 17 days ago • 3
is-sft-271828/math_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_8 4B • Updated 17 days ago • 4
is-sft-271828/math_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_8 4B • Updated 17 days ago • 4
is-sft-271828/math_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_4 4B • Updated 17 days ago • 10
is-sft-271828/math_grpo_qwen3-4b-base-qwen3-8b-3e-5-seq-seqlen_8192_chunksize_4 4B • Updated 17 days ago • 10