Purple Squirrel R1 (GGUF)

GGUF quantized versions of Purple Squirrel R1 for local inference via llama.cpp, Ollama, or LM Studio.

Full Model Multichain LoRA Collection

Available Quantizations

File Quant Size Quality Speed Use Case
purple-squirrel-r1-f16.gguf F16 15 GB Best Slowest Reference, re-quantization
purple-squirrel-r1-Q8_0.gguf Q8_0 ~8 GB Excellent Fast High-quality local inference
purple-squirrel-r1-Q5_K_M.gguf Q5_K_M ~5.5 GB Great Faster Balanced quality/speed
purple-squirrel-r1-Q4_K_M.gguf Q4_K_M 4.6 GB Good Fastest Memory-constrained devices

Model Details

Property Value
Base Model DeepSeek-R1-Distill-Llama-8B
Parameters 8B
Architecture Llama
Context Length 4096 tokens
Specialization AIDP platform ops, video analysis, blockchain

Usage

Ollama (Recommended)

A ready-to-use Modelfile is included in this repo.

# Download the Modelfile and a GGUF
huggingface-cli download purplesquirrelnetworks/purple-squirrel-r1-gguf \
  Modelfile purple-squirrel-r1-Q5_K_M.gguf --local-dir .

# Create and run
ollama create purple-squirrel-r1 -f Modelfile
ollama run purple-squirrel-r1

To use a different quantization, edit the FROM line in the Modelfile.

llama.cpp

./llama-cli -m purple-squirrel-r1-Q4_K_M.gguf \
  -p "Explain how distributed GPU inference reduces costs" \
  -n 500 -c 4096

LM Studio

  1. Download any GGUF file from this repo
  2. Open LM Studio → Load Model → Select the file
  3. Start chatting

Choosing a Quantization

  • 16GB+ RAM: Use Q8_0 for best quality
  • 8-16GB RAM: Use Q5_K_M for great balance
  • <8GB RAM: Use Q4_K_M for fastest inference
  • Re-quantizing: Start from F16

Related Resources

Resource Link
Full Model (safetensors) purple-squirrel-r1
Multichain Edition (MLX) purple-squirrel-r1-multichain
LoRA Adapters purple-squirrel-r1-multichain-lora
Research Paper AIDP Neural Cloud
Research Paper AIDP Video Forge
Coldstar Whitepaper coldstar-whitepaper
Training Data multichain-day-training
Full Collection Purple Squirrel AI

Citation

@misc{purplesquirrel-r1-gguf-2025,
  title={Purple Squirrel R1 GGUF Quantizations},
  author={Karsten, Matthew},
  year={2025},
  publisher={Purple Squirrel Media},
  howpublished={\url{https://huggingface.co/purplesquirrelnetworks/purple-squirrel-r1-gguf}},
  note={GGUF quantized DeepSeek-R1-Distill-Llama-8B for local inference}
}

License

MIT


Built by Purple Squirrel Media | GitHub

Downloads last month
87
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for purplesquirrelnetworks/purple-squirrel-r1-gguf

Quantized
(187)
this model

Space using purplesquirrelnetworks/purple-squirrel-r1-gguf 1

Collection including purplesquirrelnetworks/purple-squirrel-r1-gguf