oxford-pets-3class-vit

This model is a fine-tuned version of google/vit-base-patch16-224-in21k for a simplified pet image classification task.

It was trained on a custom 3-class subset of the Oxford-IIIT Pet dataset with the following classes:

Egyptian Mau
leonberger
samoyed

Model description

This is a transfer learning model created for an educational computer vision project.

The goal of the project was to compare:

a fine-tuned ViT model
a zero-shot CLIP model
a closed-source OpenAI vision model

The model is designed to classify images into one of the three selected pet classes.

Intended uses & limitations

Intended use

This model is intended for:

educational use
demonstration of transfer learning
comparison against CLIP and OpenAI vision models
classification of images belonging to the selected 3 classes

Limitations

This model has important limitations:

it was trained on a very small dataset
it only supports 3 classes
it is not suitable for real-world production use
predictions on unrelated animals or unseen categories may be unreliable

Training and evaluation data

Dataset source

Hugging Face dataset loader: load_dataset("pcuenq/oxford-pets")

Dataset used in this project

A custom subset was created from the Oxford-IIIT Pet dataset.

Selected classes:

Egyptian Mau
leonberger
samoyed

Dataset size

Total images: 90
Train: 60 images total (20 per class)
Validation: 15 images total (5 per class)
Test: 15 images total (5 per class)

Preprocessing

Training transforms

Random resized crop
Random horizontal flip
Conversion to tensor
Normalization with ViT image processor values

Validation / test transforms

Resize
Center crop
Conversion to tensor
Normalization with ViT image processor values

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-5
train_batch_size: 8
eval_batch_size: 8
num_epochs: 5
optimizer: AdamW (Trainer default)
best model selected using validation accuracy

Training results

Final evaluation

Validation accuracy: 1.0
Test accuracy: 1.0

Because the task was simplified to only 3 classes and the dataset is small, the model performs very well on this limited setup.

Example prediction behavior

Example: leonberger image

ViT model: leonberger
CLIP: leonberger
OpenAI: leonberger

Example: Egyptian Mau image

ViT model: Egyptian Mau
CLIP: Egyptian Mau
OpenAI: Egyptian Mau

Related resources

Downloads last month: 1

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for vasanthi8134/oxford-pets-3class-vit

Base model

google/vit-base-patch16-224-in21k

Finetuned

(2541)

this model

vasanthi8134
/

oxford-pets-3class-vit