nayana-qwen3vl-2b-stage1

Fine-tuned Vision-Language Model using MS-Swift and LoRA adapters.

Model Details

  • Base Model: Qwen/Qwen3-VL-2B-Instruct
  • Training Method: LoRA (Low-Rank Adaptation)
  • Framework: MS-Swift
  • Languages: en,kn,hi

Training Details

Dataset

  • Dataset: Nayana Docmatix Stage 1
  • Size: 150k samples
  • Languages: en,kn,hi

Training Hyperparameters

Parameter Value
LoRA Rank 16
LoRA Alpha 32
Batch Size 2
Learning Rate 0.0001
Epochs 1.0
Gradient Accumulation 4

LoRA Configuration

  • Target Modules: ['all-linear']
  • Freeze ViT: True
  • Freeze Aligner: True

Usage

With vLLM (Recommended)

from vllm import LLM, SamplingParams

llm = LLM(
    model="v1v1d1/nayana-qwen3vl-2b-stage1",
    gpu_memory_utilization=0.8,
    max_model_len=8192,
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=512,
)

# Example with image
messages = [{
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": "path/to/image.jpg"}},
        {"type": "text", "text": "Describe this image in detail."}
    ]
}]

outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

With Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

model = AutoModelForVision2Seq.from_pretrained(
    "v1v1d1/nayana-qwen3vl-2b-stage1",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("v1v1d1/nayana-qwen3vl-2b-stage1")

image = Image.open("path/to/image.jpg")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Describe this image in detail."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{v1v1d1-nayana-qwen3vl-2b-stage1},
  author = {Nayana Project},
  title = {nayana-qwen3vl-2b-stage1},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/v1v1d1/nayana-qwen3vl-2b-stage1}
}

Training Details

This model was trained using the VIVID Dataset Pipeline.

Generated with 🤖 Claude Code

Downloads last month
12
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for v1v1d1/nayana-qwen3vl-2b-stage1

Adapter
(25)
this model