TurkWeb-Edu Student (Reasoning) πΉπ·πΉπ·πΉπ·πΉπ·πΉπ·πΉπ·
A Turkish educational content scorer that generates reasoning before scoring. This is the Turkish equivalent of FineWeb-Edu classifier, but using Generative Reasoning Distillation.
How It Works
- You send Turkish text
- The model thinks (generates reasoning in Turkish)
- Then outputs an educational quality score (0-5)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("YsK-dev/TurkWeb-Edu-Student-Qwen1.5B-SOTA", dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("YsK-dev/TurkWeb-Edu-Student-Qwen1.5B-SOTA")
messages = [
{"role": "system", "content": "You are an educational quality classifier."},
{"role": "user", "content": "Analyze the following Turkish text for educational value (0-5):\n\n<your text>\n\nProvide your reasoning and final score."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
enc = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**enc, max_new_tokens=300, temperature=0.1, do_sample=True)
print(tokenizer.decode(output[0][enc["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
| Component | Value |
|---|---|
| Teacher | Qwen3-30B-A3B-Instruct-2507 |
| Student | Qwen/Qwen2.5-1.5B-Instruct |
| Method | SFT with reasoning distillation (LoRA r=64) |
| Data | 660K Turkish web samples from FineWeb-2 |
| Hardware | 1x NVIDIA H100 80GB |
| Steps | 20,000 |
- Downloads last month
- 13