🚨 EXPERIMENTAL 🚨
This is an experimental GGUF conversion following the method described by sszymczyk/fairydreaming to make DeepSeek V3.2 usable in inference using llama.cpp by treating it as a normal DeepSeek V3 model with dense attention. I've embedded the chat template from deepseek-ai/DeepSeek-V3.2-Exp in the metadata for chat completions, however tool calling likely does not work properly as DeepSeek did make changes to the templating for the full version of DeepSeek V3.2, forgoing a Jinja template for Python-based formatting.
This is a custom quant of deepseek-ai/DeepSeek-V3.2 that has the following:
- Q8_0 for the default quantization type (attention, shared experts, etc.)
- Q4_K for the FFN_UP and FFN_GATE tensors
- Q5_K for the FFN_DOWN tensors
The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization.
Model is additionally split with --no-tensor-first-split to enable easier editing of metadata.
- Downloads last month
- 25
8-bit
Model tree for Doctor-Shotgun/DeepSeek-V3.2-dense-attn-GGUF
Base model
deepseek-ai/DeepSeek-V3.2-Exp-Base