๐Ÿ‡ฎ๐Ÿ‡ณ Indra โ€” Indian Language AI Assistant

Indra is a fine-tuned LLM built on Qwen2.5-Coder-1.5B-Instruct, trained on Indian language and history datasets. It can converse in 10 Indian languages, answer questions about Indian history & culture, and still write code.

โœจ Highlights

  • ๐Ÿ—ฃ๏ธ 10 Indian languages โ€” Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
  • ๐Ÿ“œ Indian history & culture โ€” From Indus Valley to modern India
  • ๐Ÿ”„ Bilingual conversations โ€” Hinglish, code-switching, vernacular queries
  • ๐Ÿ’ป Coding preserved โ€” Still writes Python, JavaScript, and full-stack code
  • ๐Ÿ“ฆ Lightweight โ€” 1.5B parameters, runs on consumer GPUs

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RockySinghRajput/Indra", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("RockySinghRajput/Indra")

messages = [
    {"role": "system", "content": "เค†เคช Indra เคนเฅˆเค‚, เคเค• เคฌเฅเคฆเฅเคงเคฟเคฎเคพเคจ AI เคธเคนเคพเคฏเค•เฅค"},
    {"role": "user", "content": "เคญเคพเคฐเคค เค•เฅ‡ เคธเฅเคตเคคเค‚เคคเฅเคฐเคคเคพ เคธเค‚เค—เฅเคฐเคพเคฎ เค•เฅ‡ เคฌเคพเคฐเฅ‡ เคฎเฅ‡เค‚ เคฌเคคเคพเค‡เคเฅค"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

output = model.generate(inputs.input_ids, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True))

Hinglish Example

messages = [
    {"role": "system", "content": "You are Indra, a helpful AI that understands Hindi and English."},
    {"role": "user", "content": "Mujhe Python mein ek calculator banana hai, kaise karoon?"}
]

Model Details

Property Value
Base Model Qwen/Qwen2.5-Coder-1.5B-Instruct
Parameters 1.5B
Type Causal Language Model (merged LoRA fine-tune)
Languages English, Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
License Apache 2.0
Developed by RockySinghRajput
Related Model IndraCoder (coding-focused version)

Supported Languages

Language Script Code
Hindi เคฆเฅ‡เคตเคจเคพเค—เคฐเฅ€ hi
Bengali เฆฌเฆพเฆ‚เฆฒเฆพ bn
Tamil เฎคเฎฎเฎฟเฎดเฏ ta
Telugu เฐคเฑ†เฐฒเฑเฐ—เฑ te
Marathi เคฎเคฐเคพเค เฅ€ mr
Gujarati เช—เซเชœเชฐเชพเชคเซ€ gu
Kannada เฒ•เฒจเณเฒจเฒก kn
Malayalam เดฎเดฒเดฏเดพเดณเด‚ ml
Punjabi เจชเฉฐเจœเจพเจฌเฉ€ pa
Odia เฌ“เฌกเฌผเฌฟเฌ† or
English Latin en

Training Details

Training Data

Fine-tuned on 6 Indian language & culture datasets:

Dataset Purpose Content
CohereForAI/aya_dataset Multilingual Indian conversations 10 Indian languages
Cognitive-Lab/Hindi-Instruct Hindi instruction following Hindi Q&A
sarvamai/samvaad-hi-en-instruct-v2 Bilingual conversations Hindi-English
CohereForAI/aya_collection (India-filtered) Indian history & culture History, heritage, knowledge
CohereForAI/aya_collection (Hindi WikiQA) Hindi knowledge base Wikipedia-sourced Hindi QA
ai4bharat/IndicSentiment Hindi language understanding Sentiment analysis

Indian History Coverage

The model has been trained on Indian history spanning:

  • Ancient India โ€” Indus Valley Civilization, Vedic period, Maurya & Gupta Empires
  • Medieval India โ€” Delhi Sultanate, Mughal Empire, Vijayanagara, Maratha Empire, Bhakti & Sufi movements
  • Modern India โ€” British Raj, Freedom struggle, Independence, Republic
  • Indian Constitution โ€” Fundamental rights, governance, democracy
  • Culture & Heritage โ€” Art, architecture, literature, philosophy, classical music, Ayurveda, Yoga

Training Procedure

  • Method: LoRA (Low-Rank Adaptation) โ†’ merged into base model
  • LoRA Config: r=16, alpha=16, dropout=0.05
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Epochs: 2
  • Learning Rate: 2e-5 (lower to preserve base capabilities)
  • Optimizer: paged_adamw_8bit
  • Sequence Length: 512 tokens
  • Precision: FP16 mixed precision
  • Quantization: 4-bit NF4 (QLoRA) during training

Compute Infrastructure

  • Hardware: NVIDIA T4 GPU
  • Training Time: ~2 hours

Capabilities

โœ… What Indra Can Do

  • Converse in Indian languages โ€” Answer questions in Hindi, Bengali, Tamil, Telugu, and more
  • Indian history & culture โ€” Detailed knowledge of Indian civilization
  • Hinglish/bilingual โ€” Handle mixed Hindi-English naturally
  • General knowledge โ€” Science, geography, current affairs with Indian context
  • Coding โ€” Write code in Python, JavaScript, and other languages
  • Sentiment analysis โ€” Understand sentiment in Hindi text

โš ๏ธ Limitations

  • 1.5B model โ€” Smaller than commercial LLMs; may produce shorter or less nuanced responses
  • Script limitations โ€” Stronger in Hindi/Devanagari; other Indian scripts may have lower quality
  • Not a translator โ€” Optimized for conversation, not professional translation
  • May hallucinate โ€” Always verify historical facts and generated content
  • English-centric base โ€” Indian language abilities are fine-tuned on top of an English-dominant base

โŒ Out-of-Scope Use

  • Professional translation services
  • Legal or medical advice
  • Factual source of record for academic research
  • Generating harmful or culturally insensitive content

Evaluation

Test Language Task Result
Hindi Chat Hindi Gandhi's role in freedom struggle โœ… Detailed response
Indian History English Gupta Empire Golden Age โœ… Accurate overview
Hinglish Mixed "Python mein calculator banana hai" โœ… Code + Hindi explanation
Coding English Binary search implementation โœ… Working code
Indian Culture Hindi Classical music ragas โœ… Cultural knowledge

Model Family

Model Focus Repo
Indra (this model) Indian languages + history RockySinghRajput/Indra
IndraCoder Coding + debugging RockySinghRajput/Indracoder
IndraCoder-7B Advanced coding + chat (coming soon) RockySinghRajput/IndraCoder-7B

Citation

@misc{indra2025,
  title={Indra: An Indian Language AI Assistant},
  author={RockySinghRajput},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/RockySinghRajput/Indra}
}

Contact

Downloads last month
10
Safetensors
Model size
2B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RockySinghRajput/Indra

Adapter
(94)
this model