🚨 EXPERIMENTAL 🚨

This is an experimental GGUF conversion following the method described by sszymczyk/fairydreaming to make DeepSeek V3.2 usable in inference using llama.cpp by treating it as a normal DeepSeek V3 model with dense attention. I've embedded the chat template from deepseek-ai/DeepSeek-V3.2-Exp in the metadata for chat completions, however tool calling likely does not work properly as DeepSeek did make changes to the templating for the full version of DeepSeek V3.2, forgoing a Jinja template for Python-based formatting.


This is a custom quant of deepseek-ai/DeepSeek-V3.2 that has the following:

  • Q8_0 for the default quantization type (attention, shared experts, etc.)
  • Q4_K for the FFN_UP and FFN_GATE tensors
  • Q5_K for the FFN_DOWN tensors

The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization.

Model is additionally split with --no-tensor-first-split to enable easier editing of metadata.

Downloads last month
25
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Doctor-Shotgun/DeepSeek-V3.2-dense-attn-GGUF

Quantized
(19)
this model