nielsr HF Staff commited on
Commit
b774c06
·
verified ·
1 Parent(s): 3f20e57

Add pipeline tag, library name, and improve model card

Browse files

Hi! I'm Niels from the Hugging Face community science team.

This pull request improves your model card by adding the `pipeline_tag` and `library_name` to the metadata. These tags help users discover your model more easily and enable automated code snippets on the Hub. I've also updated the model card with structured links to the paper, project page, and code repository to provide more context for users.

Please review and merge if this looks good to you!

Files changed (1) hide show
  1. README.md +18 -10
README.md CHANGED
@@ -1,30 +1,38 @@
1
  ---
 
 
2
  language:
3
  - en
 
4
  metrics:
5
  - accuracy
6
- base_model:
7
- - OpenGVLab/InternVL3_5-1B-Instruct
8
  tags:
9
  - visual-reasoning
10
  - fine-grained-vqa
11
  - fine-grained-recognition
12
- license: mit
 
13
  ---
14
- # Model Card for TWIN-Qwen2.5-VL-3B
15
 
16
- <!-- Provide a quick summary of what the model is/does. -->
 
 
 
 
17
 
18
- This is the InternVL3_5-1B model post-trained on the TWIN dataset from the paper: [Same or Not? Enhancing Visual Perception in Vision-Language Models](https://glab-caltech.github.io/twin/)
19
 
20
- For further information please refer to the [project webpage](https://glab-caltech.github.io/twin/), [paper](https://arxiv.org/abs/2512.23592), and [repository](https://github.com/damianomarsili/TWIN).
 
 
 
 
21
 
22
  ## Citation
23
 
24
- If you use TWIN in your research, please consider citing our work:
25
 
26
- **BibTeX:**
27
- ```
28
  @misc{marsili2025notenhancingvisualperception,
29
  title={Same or Not? Enhancing Visual Perception in Vision-Language Models},
30
  author={Damiano Marsili and Aditya Mehta and Ryan Y. Lin and Georgia Gkioxari},
 
1
  ---
2
+ base_model:
3
+ - OpenGVLab/InternVL3_5-1B-Instruct
4
  language:
5
  - en
6
+ license: mit
7
  metrics:
8
  - accuracy
 
 
9
  tags:
10
  - visual-reasoning
11
  - fine-grained-vqa
12
  - fine-grained-recognition
13
+ pipeline_tag: image-text-to-text
14
+ library_name: transformers
15
  ---
 
16
 
17
+ # Model Card for TWIN-InternVL3_5-1B
18
+
19
+ This repository contains the InternVL3.5-1B model post-trained on the TWIN dataset, as introduced in the paper [Same or Not? Enhancing Visual Perception in Vision-Language Models](https://arxiv.org/abs/2512.23592).
20
+
21
+ TWIN is a large-scale dataset of 561,000 image-pair queries designed to enhance the perceptual abilities of Vision-Language Models (VLMs). It tasks models to determine whether two visually similar images depict the same object, encouraging attention to nuanced visual cues. Fine-tuning on TWIN yields significant gains in fine-grained recognition across various domains like art, animals, plants, and landmarks.
22
 
23
+ ## Resources
24
 
25
+ - **Project Page:** [https://glab-caltech.github.io/twin/](https://glab-caltech.github.io/twin/)
26
+ - **Paper:** [Same or Not? Enhancing Visual Perception in Vision-Language Models](https://arxiv.org/abs/2512.23592)
27
+ - **Code Repository:** [https://github.com/damianomarsili/TWIN](https://github.com/damianomarsili/TWIN)
28
+ - **Dataset:** [glab-caltech/TWIN](https://huggingface.co/datasets/glab-caltech/TWIN)
29
+ - **Benchmark Suite:** [glab-caltech/FGVQA](https://huggingface.co/datasets/glab-caltech/FGVQA)
30
 
31
  ## Citation
32
 
33
+ If you use TWIN in your research, please consider citing the work:
34
 
35
+ ```bibtex
 
36
  @misc{marsili2025notenhancingvisualperception,
37
  title={Same or Not? Enhancing Visual Perception in Vision-Language Models},
38
  author={Damiano Marsili and Aditya Mehta and Ryan Y. Lin and Georgia Gkioxari},