Instructions to use vasanthi8134/oxford-pets-3class-vit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vasanthi8134/oxford-pets-3class-vit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="vasanthi8134/oxford-pets-3class-vit") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("vasanthi8134/oxford-pets-3class-vit") model = AutoModelForImageClassification.from_pretrained("vasanthi8134/oxford-pets-3class-vit") - Notebooks
- Google Colab
- Kaggle
oxford-pets-3class-vit
This model is a fine-tuned version of google/vit-base-patch16-224-in21k for a simplified pet image classification task.
It was trained on a custom 3-class subset of the Oxford-IIIT Pet dataset with the following classes:
- Egyptian Mau
- leonberger
- samoyed
Model description
This is a transfer learning model created for an educational computer vision project.
The goal of the project was to compare:
- a fine-tuned ViT model
- a zero-shot CLIP model
- a closed-source OpenAI vision model
The model is designed to classify images into one of the three selected pet classes.
Intended uses & limitations
Intended use
This model is intended for:
- educational use
- demonstration of transfer learning
- comparison against CLIP and OpenAI vision models
- classification of images belonging to the selected 3 classes
Limitations
This model has important limitations:
- it was trained on a very small dataset
- it only supports 3 classes
- it is not suitable for real-world production use
- predictions on unrelated animals or unseen categories may be unreliable
Training and evaluation data
Dataset source
- Hugging Face dataset loader:
load_dataset("pcuenq/oxford-pets")
Dataset used in this project
A custom subset was created from the Oxford-IIIT Pet dataset.
Selected classes:
- Egyptian Mau
- leonberger
- samoyed
Dataset size
- Total images: 90
- Train: 60 images total (20 per class)
- Validation: 15 images total (5 per class)
- Test: 15 images total (5 per class)
Preprocessing
Training transforms
- Random resized crop
- Random horizontal flip
- Conversion to tensor
- Normalization with ViT image processor values
Validation / test transforms
- Resize
- Center crop
- Conversion to tensor
- Normalization with ViT image processor values
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-5
- train_batch_size: 8
- eval_batch_size: 8
- num_epochs: 5
- optimizer: AdamW (Trainer default)
- best model selected using validation accuracy
Training results
Final evaluation
- Validation accuracy: 1.0
- Test accuracy: 1.0
Because the task was simplified to only 3 classes and the dataset is small, the model performs very well on this limited setup.
Example prediction behavior
Example: leonberger image
- ViT model: leonberger
- CLIP: leonberger
- OpenAI: leonberger
Example: Egyptian Mau image
- ViT model: Egyptian Mau
- CLIP: Egyptian Mau
- OpenAI: Egyptian Mau
Related resources
- Downloads last month
- 1
Model tree for vasanthi8134/oxford-pets-3class-vit
Base model
google/vit-base-patch16-224-in21k