SOKRATES: Qwen3-8B PrOntoQA OaK-DPO Iteration 1
First DPO iteration achieving 96.8% accuracy on PrOntoQA.
Performance
| Stage | Accuracy |
|---|---|
| SFT | 93.3% |
| DPO Iter 1 | 96.8% |
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Moonlight556/sokrates-qwen3-8b-prontoqa-oak-dpo-iter1",
torch_dtype="bfloat16"
)
- Downloads last month
- -