KATIB 0.8B v0.1 — Arabic OCR Model
KATIB (كاتب) is a fine-tuned Arabic OCR model built on Qwen3.5-0.8B, designed to accurately transcribe Arabic text from images — including printed documents and handwritten content.
Despite being a 0.8B parameter model, KATIB outperforms larger 2B-class Arabic OCR models on standard benchmarks while running at 2× the speed with half the memory footprint.
✨ Highlights
- 🏆 Outperforms Qari-OCR v0.3 (2B) on WER, CER, and BLEU
- 🥈 Competitive with Qari-OCR v0.2.2.1 (2B) — a stronger model — at half the size
- ✍️ Enhanced handwriting support — better generalization to real-world Arabic scripts
- ⚡ 2× faster inference compared to 2B-parameter alternatives
- 🪶 Lightweight — deployable on modest hardware
📊 Benchmark Results
Evaluated on an Arabic OCR test set. Lower WER/CER is better; higher BLEU is better.
| Model | Size | WER ↓ | CER ↓ | BLEU ↑ |
|---|---|---|---|---|
| KATIB 0.8B v0.1 (ours) | 0.8B | 0.2386 | 0.0648 | 0.5819 |
| NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct | 2B | 0.2643 | 0.0782 | 0.5520 |
| NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct | 2B | 0.1993 | 0.0498 | 0.6402 |
| Qwen/Qwen3.5-0.8B (base, no fine-tune) | 0.8B | 2.5834 | 1.9487 | 0.0256 |
WER = Word Error Rate | CER = Character Error Rate | BLEU = Bilingual Evaluation Understudy Score
Key Takeaways
- KATIB beats Qari v0.3 across all three metrics — despite being 2.5× smaller.
- KATIB comes close to Qari v0.2.2.1 on WER and CER, with only a ~6 point BLEU gap — a strong result for a model at this size.
- The base Qwen model without fine-tuning is essentially unusable for Arabic OCR (WER > 2.5), demonstrating the value of domain-specific fine-tuning.
🚀 Quick Start
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch
model_id = "oddadmix/Katib-Qwen3.5-0.8B-0.1"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
image = Image.open("arabic_document.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Free OCR"}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
🧪 Training Details
| Detail | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-0.8B |
| Fine-tuning Method | Supervised Fine-Tuning (SFT) |
| Language | Arabic (Modern Standard + Handwritten) |
| Task | Optical Character Recognition (OCR) |
| Precision | float16 / bfloat16 |
📋 Intended Use
- ✅ Arabic document digitization
- ✅ Handwritten Arabic text recognition
- ✅ Arabic printed text extraction from images
- ✅ Low-resource / edge deployment scenarios
- ❌ Not intended for non-Arabic languages
- ❌ Not a general-purpose vision-language model
⚠️ Limitations
- Performance may degrade on very low-quality or heavily degraded scans.
- Dialectal Arabic and mixed-language (Arabic + Latin) text may reduce accuracy.
- Extreme cursive or stylized calligraphy has not been extensively evaluated.
📄 Citation
If you use KATIB in your research or application, please consider citing this model:
@misc{katib2025,
title = {KATIB 0.8B v0.1: A Lightweight Arabic OCR Model},
author = {oddadmix},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1}
}
- Downloads last month
- 71