SAM 3.1 Universal Gripper Segmentation

Fine-tuned facebook/sam3 (840M params) for universal robot gripper segmentation across DROID and AgiBOT datasets.

Files

File	Size	Description
model_weights.pt	3.14 GB	Recommended: model state_dict only (no optimizer state)
checkpoint.pt	9.39 GB	Full training checkpoint (model + optimizer + scheduler, for resuming training)
configs/gripper_universal.yaml	-	Training configuration (Hydra)

Quick Start

import torch; from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download("sazirarrwth99/sam3.1-gripper-universal", "model_weights.pt") ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False) state_dict = ckpt["model"] # 841M parameters, epoch 30

Training Details

Base model: facebook/sam3 (SAM 3, 840M params)
Architecture: Sam3Image (ViTDet backbone)
Resolution: 1008x1008
Datasets: DROID gripper (10,010 images) + RoboEngine/AgiBOT (3,628 images) = 13,667 total
Split: 12,560 train / 1,107 val
Training: 30 epochs, batch size 2, gradient accumulation 4, AMP bfloat16
Hardware: 1x NVIDIA A100-SXM4-40GB, ~47 hours total
Loss: 62.1 -> 35.7 -> 24.8 -> 22.6 -> 21.9 -> 20.9 (epoch 28 best) -> 21.7 (final)

Classes

robot_arm: Robot arm / manipulator body
gripper: End-effector / gripper

Downloads last month: -; Downloads are not tracked for this model. How to track