nayana-qwen3vl-4b-stage3-section

Fine-tuned Vision-Language Model using MS-Swift and LoRA adapters.

Model Details

Base Model: v1v1d1/nayana-qwen3vl-4b-stage2
Training Method: LoRA (Low-Rank Adaptation)
Framework: MS-Swift
Languages: kn

Training Details

Dataset

Dataset: VIVID Section Level kn
Size: N/A samples
Languages: kn

Training Hyperparameters

Parameter	Value
LoRA Rank	16
LoRA Alpha	32
Batch Size	4
Learning Rate	0.0001
Epochs	3.0
Gradient Accumulation	4

LoRA Configuration

Target Modules: ['all-linear']
Freeze ViT: True
Freeze Aligner: True

Usage

With vLLM (Recommended)

from vllm import LLM, SamplingParams

llm = LLM(
    model="v1v1d1/nayana-qwen3vl-4b-stage3-section",
    gpu_memory_utilization=0.8,
    max_model_len=8192,
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=512,
)

# Example with image
messages = [{
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": "path/to/image.jpg"}},
        {"type": "text", "text": "Describe this image in detail."}
    ]
}]

outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

With Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

model = AutoModelForVision2Seq.from_pretrained(
    "v1v1d1/nayana-qwen3vl-4b-stage3-section",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("v1v1d1/nayana-qwen3vl-4b-stage3-section")

image = Image.open("path/to/image.jpg")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Describe this image in detail."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{v1v1d1-nayana-qwen3vl-4b-stage3-section},
  author = {Nayana Project},
  title = {nayana-qwen3vl-4b-stage3-section},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/v1v1d1/nayana-qwen3vl-4b-stage3-section}
}

Training Details

This model was trained using the VIVID Dataset Pipeline.

Generated with 🤖 Claude Code

Downloads last month: 17

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for v1v1d1/nayana-qwen3vl-4b-stage3-section

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

v1v1d1/nayana-qwen3vl-4b-stage2

Adapter

(1)

this model