KATIB 0.8B v0.1 — Arabic OCR Model

KATIB (كاتب) is a fine-tuned Arabic OCR model built on Qwen3.5-0.8B, designed to accurately transcribe Arabic text from images — including printed documents and handwritten content.

Despite being a 0.8B parameter model, KATIB outperforms larger 2B-class Arabic OCR models on standard benchmarks while running at 2× the speed with half the memory footprint.


✨ Highlights

  • 🏆 Outperforms Qari-OCR v0.3 (2B) on WER, CER, and BLEU
  • 🥈 Competitive with Qari-OCR v0.2.2.1 (2B) — a stronger model — at half the size
  • ✍️ Enhanced handwriting support — better generalization to real-world Arabic scripts
  • 2× faster inference compared to 2B-parameter alternatives
  • 🪶 Lightweight — deployable on modest hardware

📊 Benchmark Results

Evaluated on an Arabic OCR test set. Lower WER/CER is better; higher BLEU is better.

Model Size WER ↓ CER ↓ BLEU ↑
KATIB 0.8B v0.1 (ours) 0.8B 0.2386 0.0648 0.5819
NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct 2B 0.2643 0.0782 0.5520
NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct 2B 0.1993 0.0498 0.6402
Qwen/Qwen3.5-0.8B (base, no fine-tune) 0.8B 2.5834 1.9487 0.0256

WER = Word Error Rate | CER = Character Error Rate | BLEU = Bilingual Evaluation Understudy Score

Key Takeaways

  • KATIB beats Qari v0.3 across all three metrics — despite being 2.5× smaller.
  • KATIB comes close to Qari v0.2.2.1 on WER and CER, with only a ~6 point BLEU gap — a strong result for a model at this size.
  • The base Qwen model without fine-tuning is essentially unusable for Arabic OCR (WER > 2.5), demonstrating the value of domain-specific fine-tuning.

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch

model_id = "oddadmix/Katib-Qwen3.5-0.8B-0.1"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

image = Image.open("arabic_document.jpg")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Free OCR"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512)

result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)

🧪 Training Details

Detail Value
Base Model Qwen/Qwen3.5-0.8B
Fine-tuning Method Supervised Fine-Tuning (SFT)
Language Arabic (Modern Standard + Handwritten)
Task Optical Character Recognition (OCR)
Precision float16 / bfloat16

📋 Intended Use

  • ✅ Arabic document digitization
  • ✅ Handwritten Arabic text recognition
  • ✅ Arabic printed text extraction from images
  • ✅ Low-resource / edge deployment scenarios
  • ❌ Not intended for non-Arabic languages
  • ❌ Not a general-purpose vision-language model

⚠️ Limitations

  • Performance may degrade on very low-quality or heavily degraded scans.
  • Dialectal Arabic and mixed-language (Arabic + Latin) text may reduce accuracy.
  • Extreme cursive or stylized calligraphy has not been extensively evaluated.

📄 Citation

If you use KATIB in your research or application, please consider citing this model:

@misc{katib2025,
  title     = {KATIB 0.8B v0.1: A Lightweight Arabic OCR Model},
  author    = {oddadmix},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1}
}

Downloads last month
71
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oddadmix/Katib-Qwen3.5-0.8B-0.1

Adapter
(10)
this model

Datasets used to train oddadmix/Katib-Qwen3.5-0.8B-0.1

Space using oddadmix/Katib-Qwen3.5-0.8B-0.1 1