TX-VOICE: Local-First Voice for TARX
Private voice AI that runs entirely on your machine. No cloud. No latency. No data leaving your device.
Overview
TX-VOICE is TARX's voice model optimized for local execution. Built on the Moshi architecture and quantized to Q8 for efficient inference on consumer hardware.
| Spec | Value |
|---|---|
| Size | 7.2 GB |
| Format | GGUF Q8 |
| Architecture | Moshi (Kyutai) |
| Min RAM | 16 GB |
| Recommended | 32 GB+ RAM, Apple Silicon or NVIDIA GPU |
Features
- Real-time voice synthesis - Low-latency text-to-speech
- 100% local - No internet required after download
- Privacy-first - Your voice data never leaves your machine
- Hardware-optimized - Leverages Metal (macOS) or CUDA (NVIDIA)
Quick Start
With TARX Workbench
TX-VOICE is automatically configured in TARX Workbench. Just enable voice in settings.
Standalone Usage
# Clone the tarx-voice service
git clone https://github.com/tarx-ai/tarx-voice
cd tarx-voice
# Build and run
cargo build --release
./target/release/tarx-voice --model TX-VOICE
WebSocket API
TX-VOICE runs on port 11438 by default:
const ws = new WebSocket('ws://localhost:11438');
ws.send(JSON.stringify({ text: "Hello, world!" }));
Hardware Requirements
| Hardware | Performance |
|---|---|
| Apple M1/M2/M3/M4 | Excellent (Metal acceleration) |
| NVIDIA RTX 30/40 | Excellent (CUDA acceleration) |
| Intel/AMD CPU | Good (AVX2 optimized) |
Integration
TX-VOICE integrates with:
- TARX Workbench - Desktop AI assistant
- TARX Code-OSS - VS Code extension
- Custom apps - WebSocket API
Architecture
Based on Kyutai's Moshi model, optimized for local inference:
- Streaming audio generation
- Low memory footprint via Q8 quantization
- Rust-native inference via Candle
License
Apache 2.0 - Free for personal and commercial use.
Links
Part of the TARX local-first AI platform.
TX-8G | TX-12G | TX-16G | TX-M-72B | TX-VOICE
- Downloads last month
- 17
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.