You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Parameters Format Quant Multimodal



Get PRISM-PRO Models on Day-0 & Support Our Research & Development efforts

PRISM-LITE Version | PRISM VIP Memberships | Ko-fi


Qwen3.5-122B-A10B-PRISM-PRO-GGUF

GGUF quantized versions of Qwen3.5-122B-A10B-PRISM-PRO -- an unrestricted PRISM Production model with full over-refusal and bias mechanisms completely removed using our State of the Art PRISM pipeline (Projected Refusal Isolation via Subspace Modification).


If you find PRISM models useful, please consider supporting development:

Ko-fi


Available Quantizations

Quantization Size BPW Description
Dynamic 57.7 GB 4.06 PRISM Dynamic -- forensic per-block quantization with 5-tier ffn_down_exps allocation

PRISM Dynamic Quantization

This is not a standard uniform quantization. PRISM Dynamic uses forensic per-block analysis derived from comprehensive KLD sensitivity scoring to assign optimal quantization types to each tensor block individually:

  • Critical blocks (convergence + exit layers): Q6_K (6.6 BPW)
  • High-impact blocks (entry zone): Q5_K_M (5.5 BPW)
  • Standard blocks (bulk processing): Q4_K_M (4.8 BPW)
  • Low-sensitivity blocks: IQ4_XS (4.25 BPW)
  • Cold blocks (lowest sensitivity): IQ3_XXS (3.06 BPW)

All attention tensors are preserved at Q8_0. All norms and routing weights are kept at F32. The imatrix used for information-sensitive quantization types is included.

Included Files

Dynamic/
  Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf   -- Dynamic quant (57.7 GB)
  mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf     -- Vision encoder (871 MB)
  imatrix.dat                                   -- Importance matrix (342 MB)

Model Highlights

  • PRISM Ablation -- State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities.
  • 122B Hybrid MoE Architecture -- 122 billion total parameters with 10 billion active per token across 256 routed experts + 1 shared expert per layer.
  • Hybrid Attention -- Novel GatedDeltaNet linear attention (36 layers) combined with full attention (12 layers) for efficient long-context processing.
  • Native Multimodal -- Vision encoder included as mmproj GGUF for seamless image and video understanding.
  • 262K Full Context Window -- Native 262,144 token context length.
  • Dual Modes -- Supports both Thinking (deep reasoning) and Instant (direct response) modes.

Usage

llama.cpp (Recommended)

# Text-only inference
./llama-cli \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  -p "Hello! Tell me about quantum computing." \
  -n 2048 -ngl 999 --temp 0.7

# With vision (multimodal)
./llama-mtmd-cli \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  --image photo.jpg \
  -p "Describe this image in detail." \
  -n 2048 -ngl 999

# Server mode
./llama-server \
  -m Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  -ngl 999 --port 8080

koboldcpp

koboldcpp \
  --model Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-PRO.gguf \
  --gpulayers 999 \
  --contextsize 8192

Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3.5-122B-A10B-PRISM-PRO-Dynamic.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
EOF

ollama create prism-pro -f Modelfile
ollama run prism-pro

Hardware Requirements

Setup VRAM Required Notes
Dynamic (GPU only) ~60 GB Fits on 1x A100 80GB or 1x H100 80GB
Dynamic (GPU + CPU offload) 48+ GB GPU + RAM Offload some layers to CPU
Dynamic (CPU only) 64+ GB RAM Slower but functional

Benchmarks

Benchmark Qwen3.5-122B-A10B GPT-5-mini Qwen3-235B-A22B
MMLU-Pro 86.7 83.7 84.4
MMLU-Redux 94.0 93.7 93.8
GPQA Diamond 86.6 82.8 81.1
HMMT Feb 25 91.4 89.2 85.1
SWE-bench Verified 72.0 72.0 --
LiveCodeBench v6 78.9 80.5 75.1
MMMU 83.9 79.0 80.6
VideoMME (w/ sub) 87.3 83.5 83.8

Note: Benchmark results are from the base Qwen3.5-122B-A10B model.


License

Based on Qwen3.5-122B-A10B by the Qwen Team (Alibaba Group). Licensed under Apache 2.0.


Acknowledgments

Based on Qwen3.5-122B-A10B by the Qwen Team. GGUF conversion and quantization by Ex0bit. See the Qwen3.5 blog post for architecture details.


Citation

@misc{qwen35prismpro_gguf,
    title  = {Qwen3.5-122B-A10B-PRISM-PRO-GGUF},
    author = {Ex0bit},
    month  = {February},
    year   = {2026}
}
Downloads last month
30
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ex0bit/Qwen3.5-122B-A10B-PRISM-PRO-GGUF

Finetuned
(7)
this model