๐ฎ๐ณ Indra โ Indian Language AI Assistant
Indra is a fine-tuned LLM built on Qwen2.5-Coder-1.5B-Instruct, trained on Indian language and history datasets. It can converse in 10 Indian languages, answer questions about Indian history & culture, and still write code.
โจ Highlights
- ๐ฃ๏ธ 10 Indian languages โ Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
- ๐ Indian history & culture โ From Indus Valley to modern India
- ๐ Bilingual conversations โ Hinglish, code-switching, vernacular queries
- ๐ป Coding preserved โ Still writes Python, JavaScript, and full-stack code
- ๐ฆ Lightweight โ 1.5B parameters, runs on consumer GPUs
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RockySinghRajput/Indra", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("RockySinghRajput/Indra")
messages = [
{"role": "system", "content": "เคเคช Indra เคนเฅเค, เคเค เคฌเฅเคฆเฅเคงเคฟเคฎเคพเคจ AI เคธเคนเคพเคฏเคเฅค"},
{"role": "user", "content": "เคญเคพเคฐเคค เคเฅ เคธเฅเคตเคคเคเคคเฅเคฐเคคเคพ เคธเคเคเฅเคฐเคพเคฎ เคเฅ เคฌเคพเคฐเฅ เคฎเฅเค เคฌเคคเคพเคเคเฅค"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(inputs.input_ids, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True))
Hinglish Example
messages = [
{"role": "system", "content": "You are Indra, a helpful AI that understands Hindi and English."},
{"role": "user", "content": "Mujhe Python mein ek calculator banana hai, kaise karoon?"}
]
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-1.5B-Instruct |
| Parameters | 1.5B |
| Type | Causal Language Model (merged LoRA fine-tune) |
| Languages | English, Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia |
| License | Apache 2.0 |
| Developed by | RockySinghRajput |
| Related Model | IndraCoder (coding-focused version) |
Supported Languages
| Language | Script | Code |
|---|---|---|
| Hindi | เคฆเฅเคตเคจเคพเคเคฐเฅ | hi |
| Bengali | เฆฌเฆพเฆเฆฒเฆพ | bn |
| Tamil | เฎคเฎฎเฎฟเฎดเฏ | ta |
| Telugu | เฐคเฑเฐฒเฑเฐเฑ | te |
| Marathi | เคฎเคฐเคพเค เฅ | mr |
| Gujarati | เชเซเชเชฐเชพเชคเซ | gu |
| Kannada | เฒเฒจเณเฒจเฒก | kn |
| Malayalam | เดฎเดฒเดฏเดพเดณเด | ml |
| Punjabi | เจชเฉฐเจเจพเจฌเฉ | pa |
| Odia | เฌเฌกเฌผเฌฟเฌ | or |
| English | Latin | en |
Training Details
Training Data
Fine-tuned on 6 Indian language & culture datasets:
| Dataset | Purpose | Content |
|---|---|---|
| CohereForAI/aya_dataset | Multilingual Indian conversations | 10 Indian languages |
| Cognitive-Lab/Hindi-Instruct | Hindi instruction following | Hindi Q&A |
| sarvamai/samvaad-hi-en-instruct-v2 | Bilingual conversations | Hindi-English |
| CohereForAI/aya_collection (India-filtered) | Indian history & culture | History, heritage, knowledge |
| CohereForAI/aya_collection (Hindi WikiQA) | Hindi knowledge base | Wikipedia-sourced Hindi QA |
| ai4bharat/IndicSentiment | Hindi language understanding | Sentiment analysis |
Indian History Coverage
The model has been trained on Indian history spanning:
- Ancient India โ Indus Valley Civilization, Vedic period, Maurya & Gupta Empires
- Medieval India โ Delhi Sultanate, Mughal Empire, Vijayanagara, Maratha Empire, Bhakti & Sufi movements
- Modern India โ British Raj, Freedom struggle, Independence, Republic
- Indian Constitution โ Fundamental rights, governance, democracy
- Culture & Heritage โ Art, architecture, literature, philosophy, classical music, Ayurveda, Yoga
Training Procedure
- Method: LoRA (Low-Rank Adaptation) โ merged into base model
- LoRA Config: r=16, alpha=16, dropout=0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Epochs: 2
- Learning Rate: 2e-5 (lower to preserve base capabilities)
- Optimizer: paged_adamw_8bit
- Sequence Length: 512 tokens
- Precision: FP16 mixed precision
- Quantization: 4-bit NF4 (QLoRA) during training
Compute Infrastructure
- Hardware: NVIDIA T4 GPU
- Training Time: ~2 hours
Capabilities
โ What Indra Can Do
- Converse in Indian languages โ Answer questions in Hindi, Bengali, Tamil, Telugu, and more
- Indian history & culture โ Detailed knowledge of Indian civilization
- Hinglish/bilingual โ Handle mixed Hindi-English naturally
- General knowledge โ Science, geography, current affairs with Indian context
- Coding โ Write code in Python, JavaScript, and other languages
- Sentiment analysis โ Understand sentiment in Hindi text
โ ๏ธ Limitations
- 1.5B model โ Smaller than commercial LLMs; may produce shorter or less nuanced responses
- Script limitations โ Stronger in Hindi/Devanagari; other Indian scripts may have lower quality
- Not a translator โ Optimized for conversation, not professional translation
- May hallucinate โ Always verify historical facts and generated content
- English-centric base โ Indian language abilities are fine-tuned on top of an English-dominant base
โ Out-of-Scope Use
- Professional translation services
- Legal or medical advice
- Factual source of record for academic research
- Generating harmful or culturally insensitive content
Evaluation
| Test | Language | Task | Result |
|---|---|---|---|
| Hindi Chat | Hindi | Gandhi's role in freedom struggle | โ Detailed response |
| Indian History | English | Gupta Empire Golden Age | โ Accurate overview |
| Hinglish | Mixed | "Python mein calculator banana hai" | โ Code + Hindi explanation |
| Coding | English | Binary search implementation | โ Working code |
| Indian Culture | Hindi | Classical music ragas | โ Cultural knowledge |
Model Family
| Model | Focus | Repo |
|---|---|---|
| Indra (this model) | Indian languages + history | RockySinghRajput/Indra |
| IndraCoder | Coding + debugging | RockySinghRajput/Indracoder |
| IndraCoder-7B | Advanced coding + chat (coming soon) | RockySinghRajput/IndraCoder-7B |
Citation
@misc{indra2025,
title={Indra: An Indian Language AI Assistant},
author={RockySinghRajput},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/RockySinghRajput/Indra}
}
Contact
- HuggingFace: RockySinghRajput
- Downloads last month
- 10
Model tree for RockySinghRajput/Indra
Base model
Qwen/Qwen2.5-1.5B Finetuned
Qwen/Qwen2.5-Coder-1.5B Finetuned
Qwen/Qwen2.5-Coder-1.5B-Instruct