π Hybrid Fish Classification: ConvNeXt Tiny
Model Description
This is a fine-tuned ConvNeXt Tiny model designed for high-accuracy classification of 40 fish species. It serves as the "Local Expert" in a larger hybrid system that integrates BioCLIP-2 for zero-shot detection.
The model was trained using a rigorous Two-Stage Transfer Learning strategy and a custom Data Balancing Engine to handle class imbalance and background noise in underwater imagery.
- Architecture: ConvNeXt Tiny (Pre-trained on ImageNet)
- Task: Multi-class Image Classification (40 Classes)
- Input Size: 224x224 RGB
- Test Accuracy: 98.96%
Performance & Metrics
The model achieved state-of-the-art results on a balanced test set of 2,027 images.
| Metric | Score |
|---|---|
| Accuracy | 98.96% |
| F1-Score (Macro) | 0.97 |
| Precision | 0.97 |
| Recall | 0.97 |
Key Class Performance
Commercial species showed exceptional recognition rates:
- Sea Bass: 1.00 Precision / 1.00 Recall
- Trout: 1.00 Precision / 1.00 Recall
- Red Mullet: 1.00 Precision / 1.00 Recall
(See confusion_matrix_final.png in files for detailed analysis)
π Supported Species & Performance
The model is fine-tuned to detect 40 specific local species. Below is the full list alongside their F1-Scores on the test set.
| ID | Species | F1 Score | ID | Species | F1 Score |
|---|---|---|---|---|---|
| 1 | Bangus | 0.99 | 21 | Knifefish | 1.00 |
| 2 | Big Head Carp | 1.00 | 22 | Long-Snouted Pipefish | 1.00 |
| 3 | Black Sea Sprat | 1.00 | 23 | Mosquito Fish | 0.99 |
| 4 | Black Spotted Barb | 1.00 | 24 | Mudfish | 0.91 |
| 5 | Catfish | 0.98 | 25 | Mullet | 0.97 |
| 6 | Climbing Perch | 0.97 | 26 | Pangasius | 0.97 |
| 7 | Fourfinger Threadfin | 1.00 | 27 | Perch | 0.98 |
| 8 | Freshwater Eel | 0.99 | 28 | Red Mullet | 1.00 |
| 9 | Gilt-Head Bream | 1.00 | 29 | Red Sea Bream | 1.00 |
| 10 | Glass Perchlet | 0.98 | 30 | Scat Fish | 1.00 |
| 11 | Goby | 0.98 | 31 | Sea Bass | 1.00 |
| 12 | Gold Fish | 1.00 | 32 | Shrimp | 1.00 |
| 13 | Gourami | 1.00 | 33 | Silver Barb | 0.98 |
| 14 | Grass Carp | 0.98 | 34 | Silver Carp | 1.00 |
| 15 | Green Spotted Puffer | 0.98 | 35 | Silver Perch | 0.98 |
| 16 | Hourse Mackerel | 1.00 | 36 | Snakehead | 0.97 |
| 17 | Indian Carp | 0.98 | 37 | Striped Red Mullet | 1.00 |
| 18 | Indo-Pacific Tarpon | 1.00 | 38 | Tenpounder | 0.97 |
| 19 | Jaguar Gapote | 1.00 | 39 | Tilapia | 0.98 |
| 20 | Janitor Fish | 1.00 | 40 | Trout | 1.00 |
β¨ Plus: For any species not listed above, the integrated BioCLIP-2 model provides Zero-Shot Classification capabilities.
β οΈ Limitations & Scope
While the model demonstrates high accuracy (98.96%) and robustness against overfitting on the test set, users should consider the following for real-world deployment:
- Lightweight Architecture: This model utilizes
ConvNeXt Tinyto prioritize inference speed and efficiency. While optimized, it may have lower capacity compared to "Huge" or "Large" architectures in extremely complex scenes. - Synthetic Data: To address class imbalance, the training set includes synthetically augmented images (up-sampling). Performance on raw, unconstrained real-world images with severe occlusion or extreme lighting conditions may vary.
- Intended Use: This model represents a high-level research baseline. It is recommended for academic analysis, prototyping, and controlled environments rather than immediate mission-critical commercial deployment.
βοΈ Training Procedure
The training process involved advanced data engineering and optimization techniques:
1. Data Engineering
- Background Removal: All images processed with
Rembgto remove noise (25% probability during augmentation). - Data Balancing:
- Up-Sampling: Minority classes augmented to min 250 images using safe augmentation techniques.
- Down-Sampling: Majority classes capped at 500 images.
2. Training Strategy (Two-Stage)
- Stage 1 (Warm-up): Frozen backbone, trained only the classifier head (
LR=1e-3) for 6 epochs. - Stage 2 (Fine-Tuning): Full model unfreezing with Differential Learning Rates:
- Backbone:
1e-5(To preserve feature extraction) - Head:
1e-4(To adapt to new classes)
- Backbone:
- Scheduler: Cosine Annealing LR.
- Optimizer: Adam.
How to Use
You can load this model using PyTorch standard libraries.
import torch
from torchvision import models, transforms
from PIL import Image
# 1. Define Class Names (40 Classes)
class_names = [
'Bangus', 'Big Head Carp', 'Black Sea Sprat', 'Black Spotted Barb', 'Catfish',
'Climbing Perch', 'Fourfinger Threadfin', 'Freshwater Eel', 'Gilt-Head Bream',
'Glass Perchlet', 'Goby', 'Gold Fish', 'Gourami', 'Grass Carp',
'Green Spotted Puffer', 'Hourse Mackerel', 'Indian Carp', 'Indo-Pacific Tarpon',
'Jaguar Gapote', 'Janitor Fish', 'KnifeFish', 'Long-Snouted Pipefish',
'Mosquito Fish', 'Mudfish', 'Mullet', 'Pangasius', 'Perch', 'Red Mullet',
'Red Sea Bream', 'Scat Fish', 'Sea Bass', 'Shrimp', 'Silver Barb',
'Silver Carp', 'Silver Perch', 'Snakehead', 'Striped Red Mullet',
'Tenpounder', 'Tilapia', 'Trout'
]
# 2. Load Architecture
model = models.convnext_tiny()
model.classifier[2] = torch.nn.Linear(model.classifier[2].in_features, 40)
# 3. Load Weights
# Ensure 'balik_modeli_final.pth' is downloaded locally
weights_path = "balik_modeli_final.pth"
model.load_state_dict(torch.load(weights_path, map_location=torch.device('cpu')))
model.eval()
# 4. Inference Function
def predict_image(image_path):
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
try:
img = Image.open(image_path).convert('RGB')
img_tensor = transform(img).unsqueeze(0)
with torch.no_grad():
outputs = model(img_tensor)
probs = torch.nn.functional.softmax(outputs, dim=1)
score, predicted = torch.max(probs, 1)
return {
"class": class_names[predicted.item()],
"confidence": float(score.item())
}
except Exception as e:
return str(e)
# Example Usage:
# result = predict_image("test_fish.jpg")
# print(f"Predicted: {result['class']} ({result['confidence']:.2%})")
Model tree for Heretix/convnext-fish-classifier
Base model
imageomics/bioclip-2