AdaSPEC β€” EAGLE3 Draft Model for Qwen3-8B

Reimplementation of AdaSPEC (adaptive speculative decoding) training objective. Baseline comparison.

Part of a course project evaluating per-step weighted loss functions for training EAGLE3 draft models. Full pipeline and source: https://github.com/XLOverflow/anlp_course_project

Collection: Qwen3 EAGLE3 β€” Weighted Loss Variants

Training

  • Framework: SpecForge (our fork: https://github.com/XLOverflow/SpecForge)
  • Target model: Qwen/Qwen3-8B
  • Draft init: AngelSlim/Qwen3-8B_eagle3
  • Data: ShareGPT-style reasoning traces (see scripts/data/ in project repo)
  • AdaSPEC adaptive loss (see paper)
  • Initialized from: baseline-uniform/epoch_4_step_82000

Evaluation (Qwen3-8B target)

Dataset Ο„ (accept. length) Speedup Accuracy
GSM8K 6.856 4.289Γ— 95.15%
MATH500 6.678 4.206Γ— 94.40%

Baselines for reference: Vanilla β‰ˆ 1Γ— speedup, EAGLE-orig β‰ˆ 2Γ— speedup.

Files

  • model.safetensors β€” draft model weights (~763 MB)
  • config.json β€” model config
  • Corresponds to: outputs/eagle3-adaspec/epoch_0_step_17026 in the original training output

Optimizer state (~3 GB) is not uploaded β€” use the project repo's training scripts to resume from scratch if needed.

Usage

from huggingface_hub import snapshot_download
draft_path = snapshot_download(repo_id="XLOverflow/qwen3-eagle3-adaspec")
# Then load with EAGLE's EaModel β€” see scripts/eval/eval_combined.py in the project repo.
Downloads last month
21
Safetensors
Model size
0.4B params
Tensor type
I64
Β·
BF16
Β·
BOOL
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for XLOverflow/qwen3-eagle3-adaspec

Finetuned
(8)
this model

Collection including XLOverflow/qwen3-eagle3-adaspec