Qwen3-4B Code Fine-Tuned

Fine-tuned Qwen3-4B on 10K verified reasoning traces from rStar-Coder (1 epoch SFT).

Optimized for algorithmic/competitive programming tasks.

πŸ“Š Performance (EvalPlus Framework)

Benchmark Base Plus vs Base Model
HumanEval 68.9% 64.0% +6.9% βœ…
MBPP 58.2% 50.8% -8.8% ⚠️

Evaluated using EvalPlus with greedy decoding

Performance Trade-off

  • βœ… Improved on complex algorithmic tasks (HumanEval: 62% β†’ 68.9%)
  • ⚠️ Regression on simple practical tasks (MBPP: 67% β†’ 58.2%)

Why? Trained on competition-style problems (LeetCode, Codeforces) which emphasizes algorithmic reasoning over simple utility functions.

Use this model if: You need help with algorithms, data structures, competitive programming
Use base model if: You need simple utility functions, basic string/list operations

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "prometheus04/qwen3-4b-code-finetuned",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("prometheus04/qwen3-4b-code-finetuned", trust_remote_code=True)

# Complete a function
messages = [
    {"role": "system", "content": "You are a programming expert."},
    {"role": "user", "content": "def fibonacci(n):\n    "}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ“ Training Details

  • Base Model: Qwen/Qwen3-4B (4B parameters)
  • Dataset: microsoft/rStar-Coder synthetic_sft (10K samples)
    • Competition problems from LeetCode, Codeforces, etc.
    • Execution-verified solutions with reasoning traces
  • Method: LoRA fine-tuning
    • Rank: 32
    • Alpha: 64
    • Target modules: All linear layers (q,k,v,o,gate,up,down)
    • rsLoRA: Enabled
  • Training:
    • Epochs: 1
    • Batch size: 2 Γ— 8 grad accum = 16 effective
    • Learning rate: 2e-4 (cosine schedule)
    • Optimizer: AdamW 8-bit
    • Max seq length: 4096

πŸ’‘ Key Features

βœ… Trained on execution-verified competition solutions
βœ… Curriculum learning (easy β†’ hard)
βœ… Decontaminated from HumanEval/MBPP
βœ… Efficient LoRA (1.62% trainable params)
βœ… Production-ready merged weights

πŸ“ˆ Comparison

Model HumanEval MBPP Specialization
Qwen3-4B Base 62% 67% General
This Model 68.9% 58.2% Algorithms
GPT-3.5-turbo ~75% ~70% General

🎯 Strengths

  • Binary search, dynamic programming, graph algorithms
  • Recursion, backtracking, tree traversal
  • Complex data structure manipulation
  • Competitive programming patterns

⚠️ Limitations

  • Not recommended for simple utility functions (use base model instead)
  • Trained on Python-only data
  • May overthink simple problems
  • Best for algorithmic/competitive programming tasks
  • Optimal for functions <4K tokens

πŸ”§ Recommended Use Cases

βœ… LeetCode/HackerRank style problems
βœ… Algorithm implementation
βœ… Data structure coding
βœ… Competitive programming practice
βœ… Technical interview preparation

❌ Simple string manipulation
❌ Basic list operations
❌ Trivial utility functions

πŸ“„ Citation

@misc{qwen3-4b-code-finetuned,
  author = {prometheus04},
  title = {Qwen3-4B Code Fine-Tuned on rStar-Coder},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/prometheus04/qwen3-4b-code-finetuned}},
}

πŸ“œ License

Apache 2.0 (inherited from base model)

Downloads last month
7
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for prometheus04/qwen3-4b-code-finetuned

Finetuned
Qwen/Qwen3-4B
Finetuned
(529)
this model

Dataset used to train prometheus04/qwen3-4b-code-finetuned

Evaluation results