Mizo Bible LoRA (NLLB-200-distilled-600M)
This is a LoRA adapter for Mizo, trained in two stages on top of
facebook/nllb-200-distilled-600M:
...
Mizo Bible LoRA (NLLB-200-distilled-600M)
This is a LoRA adapter for Mizo, trained in two stages on top of
facebook/nllb-200-distilled-600M:
Stage 1 – Dictionary
- Data: Mizo dictionary pairs
- English headword → Mizo explanation/definition
- Purpose: give the model strong coverage of modern Mizo vocabulary and word senses.
- Data: Mizo dictionary pairs
Stage 2 – Bible (Eng → Mizo)
- Data: English → Mizo verse-level alignment of the Bible.
- Training continues from the dictionary LoRA (stage 1), so Bible fine-tuning sits “on top” of the dictionary knowledge.
- Some archaic usage of the conjunction “Tin” in Mizo targets is down-weighted by cleaning it from the training text, so the model does not spam “Tin” in normal sentences.
Base model
This repository contains only the LoRA adapter weights, not the full base model.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
import torch
BASE_MODEL = "facebook/nllb-200-distilled-600M"
LORA_REPO = "frankiethiak/nllb-mizo-bible-lora" # update if different
tokenizer = AutoTokenizer.from_pretrained(
BASE_MODEL,
src_lang="eng_Latn",
tgt_lang="lus_Latn",
)
base_model = AutoModelForSeq2SeqLM.from_pretrained(BASE_MODEL)
model = PeftModel.from_pretrained(base_model, LORA_REPO)
model.eval()
text = "In the beginning God created the heaven and the earth."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
gen = model.generate(**inputs, max_new_tokens=80, num_beams=4)
print(tokenizer.decode(gen[0], skip_special_tokens=True))
- Downloads last month
- 2
Model tree for flt7007/nllb-mizo-bible-lora
Base model
facebook/nllb-200-distilled-600M