---
license: mit
tags:
  - 3d-generation
  - image-to-3d
  - onnx
  - ios
  - mobile
datasets: []
pipeline_tag: image-to-3d
---

# TripoSR iOS (ONNX)

This is the **ONNX-converted encoder** from [TripoSR](https://github.com/VAST-AI-Research/TripoSR), a fast feedforward 3D reconstruction model from Stability AI and Tripo AI.

## Model Details

| Property | Value |
|----------|-------|
| Model Size | ~1.6 GB |
| Parameters | 419M |
| Input | RGB Image (1, 3, 512, 512) |
| Output | Scene Codes / Triplane (1, 3, 40, 64, 64) |
| ONNX Opset | 18 |
| Format | ONNX with external data |

## Usage

### Python (ONNX Runtime)

```python
import onnxruntime as ort
import numpy as np
from PIL import Image

# Load the model
session = ort.InferenceSession(
    "triposr_encoder.onnx",
    providers=['CPUExecutionProvider']  # or 'CoreMLExecutionProvider' for iOS
)

# Preprocess image
image = Image.open("your_image.png").convert("RGB").resize((512, 512))
input_array = np.array(image).astype(np.float32) / 255.0
input_array = input_array.transpose(2, 0, 1)[np.newaxis, ...]

# Run inference
scene_codes = session.run(None, {"input_image": input_array})[0]
print(f"Scene codes shape: {scene_codes.shape}")
```

### iOS (Swift with ONNX Runtime)

Add ONNX Runtime to your project via SPM:
```
https://github.com/nicklockwood/ORTSwift
```

```swift
import OnnxRuntimeBindings

// Load model
let session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: nil)

// Run inference
let inputTensor = try ORTValue(tensorData: imageData, elementType: .float, shape: [1, 3, 512, 512])
let outputs = try session.run(
    withInputs: ["input_image": inputTensor],
    outputNames: ["scene_codes"]
)
```

## Architecture

This model is the encoder portion of TripoSR:

1. **Image Tokenizer** - DINO ViT-B/16 pretrained vision transformer
2. **Backbone** - Transformer decoder with cross-attention
3. **Post Processor** - Converts tokens to triplane representation

The output "scene codes" are triplane features that can be used with a decoder and marching cubes algorithm to extract 3D meshes.

## Files

- `triposr_encoder.onnx` - ONNX model graph (2.6 MB)
- `triposr_encoder.onnx.data` - Model weights (1.6 GB)

## Citation

Original TripoSR paper:
```bibtex
@article{TripoSR2024,
  title={TripoSR: Fast 3D Object Reconstruction from a Single Image},
  author={Tochilkin, Dmitry and Pankratz, David and Liu, Zexiang and Huang, Zixuan and Letts, Adam and Li, Yangguang and Liang, Ding and Laforte, Christian and Jampani, Varun and Cao, Yan-Pei},
  journal={arXiv preprint arXiv:2403.02151},
  year={2024}
}
```

## License

MIT License (same as original TripoSR)