SAM 3.1 Universal Gripper Segmentation
Fine-tuned facebook/sam3 (840M params) for universal robot gripper segmentation across DROID and AgiBOT datasets.
Files
| File | Size | Description |
|---|---|---|
| model_weights.pt | 3.14 GB | Recommended: model state_dict only (no optimizer state) |
| checkpoint.pt | 9.39 GB | Full training checkpoint (model + optimizer + scheduler, for resuming training) |
| configs/gripper_universal.yaml | - | Training configuration (Hydra) |
Quick Start
import torch; from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download("sazirarrwth99/sam3.1-gripper-universal", "model_weights.pt") ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False) state_dict = ckpt["model"] # 841M parameters, epoch 30
Training Details
- Base model: facebook/sam3 (SAM 3, 840M params)
- Architecture: Sam3Image (ViTDet backbone)
- Resolution: 1008x1008
- Datasets: DROID gripper (10,010 images) + RoboEngine/AgiBOT (3,628 images) = 13,667 total
- Split: 12,560 train / 1,107 val
- Training: 30 epochs, batch size 2, gradient accumulation 4, AMP bfloat16
- Hardware: 1x NVIDIA A100-SXM4-40GB, ~47 hours total
- Loss: 62.1 -> 35.7 -> 24.8 -> 22.6 -> 21.9 -> 20.9 (epoch 28 best) -> 21.7 (final)
Classes
- robot_arm: Robot arm / manipulator body
- gripper: End-effector / gripper