βš–οΈ Kanoonu AI β€” Phi-3 GGUF (Q4_K_M)

Quantized GGUF version of Kanoonu AI β€” ready for local deployment

LoRA Adapter GGUF Dataset License


πŸ“– Overview

This is the GGUF quantized version of tejasgowda05/Kanoonu-AI-Phi3-Finetuned β€” a Phi-3-mini model fine-tuned on 23,370 Indian law Q&A pairs covering the Indian Penal Code (IPC), Code of Criminal Procedure (CrPC), Constitution of India, and other statutes.

The GGUF format allows this model to run locally on CPU or GPU without requiring a high-end machine, making Indian legal information accessible to everyone.


πŸ“¦ Available Files

File Quantization Size Use Case
phi-3-mini-4k-instruct.Q4_K_M.gguf Q4_K_M ~2.2 GB βœ… Recommended β€” best balance of size and quality

What is Q4_K_M?

Q4_K_M is a 4-bit quantization method that compresses the model to ~2.2GB with negligible quality loss compared to the full precision version. It runs comfortably on most modern laptops.


πŸš€ Quick Start

Option 1 β€” Ollama (Easiest)

# Step 1 β€” Install Ollama from https://ollama.com/download

# Step 2 β€” Pull and run directly
ollama run hf.co/tejasgowda05/Kanoonu-AI-Phi3-GGUF:Q4_K_M

Option 2 β€” llama.cpp CLI

llama-cli -hf tejasgowda05/Kanoonu-AI-Phi3-GGUF --jinja

Option 3 β€” llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path = "./kanoonu_model/phi-3-mini-4k-instruct.Q4_K_M.gguf",
    n_ctx      = 2048,
    n_threads  = 4,
)

response = llm(
    "<|system|>\nYou are Kanoonu AI, an expert Indian legal assistant.\n<|end|>\n"
    "<|user|>\nWhat is an FIR and how is it filed in India?<|end|>\n"
    "<|assistant|>\n",
    max_tokens = 200,
    stop       = ["<|end|>", "<|endoftext|>"],
)

print(response["choices"][0]["text"])

Option 4 β€” Python with ctransformers

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "tejasgowda05/Kanoonu-AI-Phi3-GGUF",
    model_file = "phi-3-mini-4k-instruct.Q4_K_M.gguf",
    model_type = "mistral",
)

print(llm("What are the fundamental rights in the Indian Constitution?"))

πŸ’» Hardware Requirements

Setup Minimum RAM Performance
CPU only 8 GB RAM Slow (~1-2 tokens/sec)
CPU + 8GB RAM 8 GB RAM Moderate (~3-5 tokens/sec)
GPU (4GB VRAM) 4 GB VRAM Fast (~15-20 tokens/sec)
GPU (8GB VRAM) 8 GB VRAM Very Fast (~30+ tokens/sec)

πŸ—οΈ How This Was Created

microsoft/Phi-3-mini-4k-instruct  (3.8B base model)
         ↓
   QLoRA Fine-tuning
   (24,607 Indian law Q&A pairs)
         ↓
tejasgowda05/Kanoonu-AI-Phi3-Finetuned  (LoRA adapters)
         ↓
   Merge LoRA β†’ Convert to GGUF β†’ Quantize Q4_K_M
   (via Unsloth)
         ↓
tejasgowda05/Kanoonu-AI-Phi3-GGUF  ← you are here
Training Metric Value
Final Train Loss 0.3478
Best Eval Loss 0.6568
Training Examples 23,370
Training Time ~270 minutes

πŸ”— Related Resources

Resource Link
πŸ€— LoRA Adapter tejasgowda05/Kanoonu-AI-Phi3-Finetuned
πŸ€— GGUF Model (this repo) tejasgowda05/Kanoonu-AI-Phi3-GGUF
πŸ€— Formatted Dataset tejasgowda05/Indian-Kanoonu-Dataset
πŸ“¦ Base Model microsoft/Phi-3-mini-4k-instruct
πŸ“¦ Original Dataset viber1/indian-law-dataset

⚠️ Limitations & Disclaimer

  • This model is intended for educational and informational purposes only
  • It is not a substitute for professional legal advice
  • Always consult a qualified lawyer for legal matters
  • The model may occasionally produce inaccurate or outdated legal information

πŸ‘€ Author

Tejas Gowda N β€” tejasgowda05

Built as part of the Kanoonu AI project β€” making Indian legal information accessible through conversational AI.


πŸ“„ License

Apache 2.0 β€” inherited from microsoft/Phi-3-mini-4k-instruct and the original dataset.


Trained 2x faster with Unsloth

Downloads last month
330
GGUF
Model size
4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tejasgowda05/Kanoonu-AI-Phi3-GGUF

Dataset used to train tejasgowda05/Kanoonu-AI-Phi3-GGUF