SAM 3.1 Universal Gripper Segmentation

Fine-tuned facebook/sam3 (840M params) for universal robot gripper segmentation across DROID and AgiBOT datasets.

Files

File Size Description
model_weights.pt 3.14 GB Recommended: model state_dict only (no optimizer state)
checkpoint.pt 9.39 GB Full training checkpoint (model + optimizer + scheduler, for resuming training)
configs/gripper_universal.yaml - Training configuration (Hydra)

Quick Start

import torch; from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download("sazirarrwth99/sam3.1-gripper-universal", "model_weights.pt") ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False) state_dict = ckpt["model"] # 841M parameters, epoch 30

Training Details

  • Base model: facebook/sam3 (SAM 3, 840M params)
  • Architecture: Sam3Image (ViTDet backbone)
  • Resolution: 1008x1008
  • Datasets: DROID gripper (10,010 images) + RoboEngine/AgiBOT (3,628 images) = 13,667 total
  • Split: 12,560 train / 1,107 val
  • Training: 30 epochs, batch size 2, gradient accumulation 4, AMP bfloat16
  • Hardware: 1x NVIDIA A100-SXM4-40GB, ~47 hours total
  • Loss: 62.1 -> 35.7 -> 24.8 -> 22.6 -> 21.9 -> 20.9 (epoch 28 best) -> 21.7 (final)

Classes

  1. robot_arm: Robot arm / manipulator body
  2. gripper: End-effector / gripper
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support