--- license: mit tags: - 3d-generation - image-to-3d - onnx - ios - mobile datasets: [] pipeline_tag: image-to-3d --- # TripoSR iOS (ONNX) This is the **ONNX-converted encoder** from [TripoSR](https://github.com/VAST-AI-Research/TripoSR), a fast feedforward 3D reconstruction model from Stability AI and Tripo AI. ## Model Details | Property | Value | |----------|-------| | Model Size | ~1.6 GB | | Parameters | 419M | | Input | RGB Image (1, 3, 512, 512) | | Output | Scene Codes / Triplane (1, 3, 40, 64, 64) | | ONNX Opset | 18 | | Format | ONNX with external data | ## Usage ### Python (ONNX Runtime) ```python import onnxruntime as ort import numpy as np from PIL import Image # Load the model session = ort.InferenceSession( "triposr_encoder.onnx", providers=['CPUExecutionProvider'] # or 'CoreMLExecutionProvider' for iOS ) # Preprocess image image = Image.open("your_image.png").convert("RGB").resize((512, 512)) input_array = np.array(image).astype(np.float32) / 255.0 input_array = input_array.transpose(2, 0, 1)[np.newaxis, ...] # Run inference scene_codes = session.run(None, {"input_image": input_array})[0] print(f"Scene codes shape: {scene_codes.shape}") ``` ### iOS (Swift with ONNX Runtime) Add ONNX Runtime to your project via SPM: ``` https://github.com/nicklockwood/ORTSwift ``` ```swift import OnnxRuntimeBindings // Load model let session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: nil) // Run inference let inputTensor = try ORTValue(tensorData: imageData, elementType: .float, shape: [1, 3, 512, 512]) let outputs = try session.run( withInputs: ["input_image": inputTensor], outputNames: ["scene_codes"] ) ``` ## Architecture This model is the encoder portion of TripoSR: 1. **Image Tokenizer** - DINO ViT-B/16 pretrained vision transformer 2. **Backbone** - Transformer decoder with cross-attention 3. **Post Processor** - Converts tokens to triplane representation The output "scene codes" are triplane features that can be used with a decoder and marching cubes algorithm to extract 3D meshes. ## Files - `triposr_encoder.onnx` - ONNX model graph (2.6 MB) - `triposr_encoder.onnx.data` - Model weights (1.6 GB) ## Citation Original TripoSR paper: ```bibtex @article{TripoSR2024, title={TripoSR: Fast 3D Object Reconstruction from a Single Image}, author={Tochilkin, Dmitry and Pankratz, David and Liu, Zexiang and Huang, Zixuan and Letts, Adam and Li, Yangguang and Liang, Ding and Laforte, Christian and Jampani, Varun and Cao, Yan-Pei}, journal={arXiv preprint arXiv:2403.02151}, year={2024} } ``` ## License MIT License (same as original TripoSR)