Mizo Bible LoRA (NLLB-200-distilled-600M)

This is a LoRA adapter for Mizo, trained in two stages on top of facebook/nllb-200-distilled-600M: ...

Mizo Bible LoRA (NLLB-200-distilled-600M)

This is a LoRA adapter for Mizo, trained in two stages on top of facebook/nllb-200-distilled-600M:

Stage 1 – Dictionary
- Data: Mizo dictionary pairs
  - English headword → Mizo explanation/definition
- Purpose: give the model strong coverage of modern Mizo vocabulary and word senses.
Stage 2 – Bible (Eng → Mizo)
- Data: English → Mizo verse-level alignment of the Bible.
- Training continues from the dictionary LoRA (stage 1), so Bible fine-tuning sits “on top” of the dictionary knowledge.
- Some archaic usage of the conjunction “Tin” in Mizo targets is down-weighted by cleaning it from the training text, so the model does not spam “Tin” in normal sentences.

Base model

facebook/nllb-200-distilled-600M

This repository contains only the LoRA adapter weights, not the full base model.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
import torch

BASE_MODEL = "facebook/nllb-200-distilled-600M"
LORA_REPO  = "frankiethiak/nllb-mizo-bible-lora"   # update if different

tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL,
    src_lang="eng_Latn",
    tgt_lang="lus_Latn",
)

base_model = AutoModelForSeq2SeqLM.from_pretrained(BASE_MODEL)
model = PeftModel.from_pretrained(base_model, LORA_REPO)
model.eval()

text = "In the beginning God created the heaven and the earth."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    gen = model.generate(**inputs, max_new_tokens=80, num_beams=4)

print(tokenizer.decode(gen[0], skip_special_tokens=True))

Downloads last month: 2

Model tree for flt7007/nllb-mizo-bible-lora

Base model

facebook/nllb-200-distilled-600M

Adapter

(45)

this model