nayana-qwen3vl-4b-stage3-section

Fine-tuned Vision-Language Model using MS-Swift and LoRA adapters.

Model Details

  • Base Model: v1v1d1/nayana-qwen3vl-4b-stage2
  • Training Method: LoRA (Low-Rank Adaptation)
  • Framework: MS-Swift
  • Languages: kn

Training Details

Dataset

  • Dataset: VIVID Section Level kn
  • Size: N/A samples
  • Languages: kn

Training Hyperparameters

Parameter Value
LoRA Rank 16
LoRA Alpha 32
Batch Size 4
Learning Rate 0.0001
Epochs 3.0
Gradient Accumulation 4

LoRA Configuration

  • Target Modules: ['all-linear']
  • Freeze ViT: True
  • Freeze Aligner: True

Usage

With vLLM (Recommended)

from vllm import LLM, SamplingParams

llm = LLM(
    model="v1v1d1/nayana-qwen3vl-4b-stage3-section",
    gpu_memory_utilization=0.8,
    max_model_len=8192,
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=512,
)

# Example with image
messages = [{
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": "path/to/image.jpg"}},
        {"type": "text", "text": "Describe this image in detail."}
    ]
}]

outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

With Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

model = AutoModelForVision2Seq.from_pretrained(
    "v1v1d1/nayana-qwen3vl-4b-stage3-section",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("v1v1d1/nayana-qwen3vl-4b-stage3-section")

image = Image.open("path/to/image.jpg")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Describe this image in detail."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{v1v1d1-nayana-qwen3vl-4b-stage3-section},
  author = {Nayana Project},
  title = {nayana-qwen3vl-4b-stage3-section},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/v1v1d1/nayana-qwen3vl-4b-stage3-section}
}

Training Details

This model was trained using the VIVID Dataset Pipeline.

Generated with 🤖 Claude Code

Downloads last month
17
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for v1v1d1/nayana-qwen3vl-4b-stage3-section

Adapter
(1)
this model