⚖️ Kanoonu AI — Phi-3 GGUF (Q4_K_M)

Quantized GGUF version of Kanoonu AI — ready for local deployment

📖 Overview

This is the GGUF quantized version of tejasgowda05/Kanoonu-AI-Phi3-Finetuned — a Phi-3-mini model fine-tuned on 23,370 Indian law Q&A pairs covering the Indian Penal Code (IPC), Code of Criminal Procedure (CrPC), Constitution of India, and other statutes.

The GGUF format allows this model to run locally on CPU or GPU without requiring a high-end machine, making Indian legal information accessible to everyone.

📦 Available Files

File	Quantization	Size	Use Case
`phi-3-mini-4k-instruct.Q4_K_M.gguf`	Q4_K_M	~2.2 GB	✅ Recommended — best balance of size and quality

What is Q4_K_M?

Q4_K_M is a 4-bit quantization method that compresses the model to ~2.2GB with negligible quality loss compared to the full precision version. It runs comfortably on most modern laptops.

🚀 Quick Start

Option 1 — Ollama (Easiest)

# Step 1 — Install Ollama from https://ollama.com/download

# Step 2 — Pull and run directly
ollama run hf.co/tejasgowda05/Kanoonu-AI-Phi3-GGUF:Q4_K_M

Option 2 — llama.cpp CLI

llama-cli -hf tejasgowda05/Kanoonu-AI-Phi3-GGUF --jinja

Option 3 — llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path = "./kanoonu_model/phi-3-mini-4k-instruct.Q4_K_M.gguf",
    n_ctx      = 2048,
    n_threads  = 4,
)

response = llm(
    "<|system|>\nYou are Kanoonu AI, an expert Indian legal assistant.\n<|end|>\n"
    "<|user|>\nWhat is an FIR and how is it filed in India?<|end|>\n"
    "<|assistant|>\n",
    max_tokens = 200,
    stop       = ["<|end|>", "<|endoftext|>"],
)

print(response["choices"][0]["text"])

Option 4 — Python with ctransformers

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "tejasgowda05/Kanoonu-AI-Phi3-GGUF",
    model_file = "phi-3-mini-4k-instruct.Q4_K_M.gguf",
    model_type = "mistral",
)

print(llm("What are the fundamental rights in the Indian Constitution?"))

💻 Hardware Requirements

Setup	Minimum RAM	Performance
CPU only	8 GB RAM	Slow (~1-2 tokens/sec)
CPU + 8GB RAM	8 GB RAM	Moderate (~3-5 tokens/sec)
GPU (4GB VRAM)	4 GB VRAM	Fast (~15-20 tokens/sec)
GPU (8GB VRAM)	8 GB VRAM	Very Fast (~30+ tokens/sec)

🏗️ How This Was Created

microsoft/Phi-3-mini-4k-instruct  (3.8B base model)
         ↓
   QLoRA Fine-tuning
   (24,607 Indian law Q&A pairs)
         ↓
tejasgowda05/Kanoonu-AI-Phi3-Finetuned  (LoRA adapters)
         ↓
   Merge LoRA → Convert to GGUF → Quantize Q4_K_M
   (via Unsloth)
         ↓
tejasgowda05/Kanoonu-AI-Phi3-GGUF  ← you are here

Training Metric	Value
Final Train Loss	0.3478
Best Eval Loss	0.6568
Training Examples	23,370
Training Time	~270 minutes

🔗 Related Resources

Resource	Link
🤗 LoRA Adapter	tejasgowda05/Kanoonu-AI-Phi3-Finetuned
🤗 GGUF Model (this repo)	tejasgowda05/Kanoonu-AI-Phi3-GGUF
🤗 Formatted Dataset	tejasgowda05/Indian-Kanoonu-Dataset
📦 Base Model	microsoft/Phi-3-mini-4k-instruct
📦 Original Dataset	viber1/indian-law-dataset

⚠️ Limitations & Disclaimer

This model is intended for educational and informational purposes only
It is not a substitute for professional legal advice
Always consult a qualified lawyer for legal matters
The model may occasionally produce inaccurate or outdated legal information

👤 Author

Tejas Gowda N — tejasgowda05

Built as part of the Kanoonu AI project — making Indian legal information accessible through conversational AI.

📄 License

Apache 2.0 — inherited from microsoft/Phi-3-mini-4k-instruct and the original dataset.

Trained 2x faster with Unsloth

Downloads last month: 330

GGUF

Model size

4B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tejasgowda05/Kanoonu-AI-Phi3-GGUF

Base model

microsoft/Phi-3-mini-4k-instruct

Finetuned

tejasgowda05/Kanoonu-AI-Phi3-Finetuned

Quantized

(1)

this model

tejasgowda05
/

Kanoonu-AI-Phi3-GGUF