oxford-pets-3class-vit

This model is a fine-tuned version of google/vit-base-patch16-224-in21k for a simplified pet image classification task.

It was trained on a custom 3-class subset of the Oxford-IIIT Pet dataset with the following classes:

  • Egyptian Mau
  • leonberger
  • samoyed

Model description

This is a transfer learning model created for an educational computer vision project.

The goal of the project was to compare:

  • a fine-tuned ViT model
  • a zero-shot CLIP model
  • a closed-source OpenAI vision model

The model is designed to classify images into one of the three selected pet classes.

Intended uses & limitations

Intended use

This model is intended for:

  • educational use
  • demonstration of transfer learning
  • comparison against CLIP and OpenAI vision models
  • classification of images belonging to the selected 3 classes

Limitations

This model has important limitations:

  • it was trained on a very small dataset
  • it only supports 3 classes
  • it is not suitable for real-world production use
  • predictions on unrelated animals or unseen categories may be unreliable

Training and evaluation data

Dataset source

  • Hugging Face dataset loader: load_dataset("pcuenq/oxford-pets")

Dataset used in this project

A custom subset was created from the Oxford-IIIT Pet dataset.

Selected classes:

  • Egyptian Mau
  • leonberger
  • samoyed

Dataset size

  • Total images: 90
  • Train: 60 images total (20 per class)
  • Validation: 15 images total (5 per class)
  • Test: 15 images total (5 per class)

Preprocessing

Training transforms

  • Random resized crop
  • Random horizontal flip
  • Conversion to tensor
  • Normalization with ViT image processor values

Validation / test transforms

  • Resize
  • Center crop
  • Conversion to tensor
  • Normalization with ViT image processor values

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-5
  • train_batch_size: 8
  • eval_batch_size: 8
  • num_epochs: 5
  • optimizer: AdamW (Trainer default)
  • best model selected using validation accuracy

Training results

Final evaluation

  • Validation accuracy: 1.0
  • Test accuracy: 1.0

Because the task was simplified to only 3 classes and the dataset is small, the model performs very well on this limited setup.

Example prediction behavior

Example: leonberger image

  • ViT model: leonberger
  • CLIP: leonberger
  • OpenAI: leonberger

Example: Egyptian Mau image

  • ViT model: Egyptian Mau
  • CLIP: Egyptian Mau
  • OpenAI: Egyptian Mau

Related resources

Downloads last month
1
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vasanthi8134/oxford-pets-3class-vit

Finetuned
(2541)
this model

Dataset used to train vasanthi8134/oxford-pets-3class-vit

Space using vasanthi8134/oxford-pets-3class-vit 1