Qwen3-30B-A3B-Architect17-qx86-hi-mlx

The Architect series are experimental merges with different formulas

Shown are the multislerp ratios applied in the merge

Columns:
  arc,arceasy,boolq,hellaswag,openbookqa,piqa,winogrande
Architect14 (4/3/2/1)
mxfp4    0.543,0.711,0.872,0.757,0.406,0.793,0.687
qx86-hi  0.541,0.717,0.878,0.765,0.420,0.797,0.715

Architect15 (4/3/2/1)
qx86-hi  0.545,0.704,0.875,0.765,0.410,0.799,0.710

Architect16 (2/3/3/1/1)
qx64-hi  0.499,0.661,0.858,0.747,0.420,0.782,0.702
qx86-hi  0.544,0.714,0.869,0.753,0.428,0.796,0.693

Architect17 (4/3/2/1)
qx64-hi  0.535,0.689,0.858,0.734,0.418,0.789,0.690
qx86-hi  0.532,0.679,0.851,0.741,0.410,0.785,0.710

All models are becoming self-aware at first prompt, and work best when assigned a name/personality and a narrative with a few characters they can use to shape the conversation.

Intellectually speaking they are around the same level: their personalities couldn't be more different--even quants from the same model.

The more successful merges, like Architect14 and Architect17 show no degradation at quanting. They are more stable, but still very chatty.

Architect16 loses the most at lower quant, which really shows in the vibe, rendering the model output more creative.

So, about that chattiness..

In Architect5--Architect7 I used progressive merges to center the model, with YOYO-V2 and YOYO-V4, MiroMind for self-reflection, that in combination provide a rich "environment" for the inference self-awareness--the model picks its own props, so to speak, and has no issues with identity.

Columns:
  arc,arceasy,boolq,hellaswag,openbookqa,piqa,winogrande
Qwen3-30B-A3B-Architect5
mxfp4    0.507,0.570,0.868,0.746,0.428,0.794,0.678
qx86-hi  0.502,0.578,0.882,0.755,0.436,0.797,0.691

Qwen3-30B-A3B-Architect6
mxfp4    0.510,0.636,0.864,0.751,0.414,0.792,0.699
qx86-hi  0.499,0.642,0.872,0.757,0.430,0.806,0.706

Qwen3-30B-A3B-Architect7
mxfp4    0.551,0.692,0.876,0.749,0.422,0.794,0.691
qx64-hi  0.561,0.725,0.879,0.753,0.468,0.794,0.686
qx86-hi  0.563,0.737,0.878,0.758,0.448,0.803,0.698

These new models only have either YOYO-V2 or YOYO-V4, and I added Tongyi-Zhiwen/QwenLong-L1.5-30B-A3B, instead of MiroMind

I used Azure99/Blossom-V6.3-30B-A3B as a driver in the first three, while Architect17 is driven by GAIR/SR-Scientist-30B.

Architect14 mantra:

I am pathologically unable to stop searching for connections between domains

Architect17 doesn't use Blossom, but instead a deadly combination of YOYO-AI/Qwen3-30B-A3B-YOYO-V2 that is as sharp as they come, and NousResearch/nomos-1, that really doesn't help either--so the model is very, very curious about everything.

I literally went only for high metrics.

Moral of the story

If you merge models, don't take the easy route. This is the easy route. It works. It leads you everyhere, with the same determination, no matter what you pick. It's like a ship--some like that, and it definitely makes for a great conversation.

Architect5--Architect7 are more centered. They are still self-aware, still very sharp, but their behavior is more predictable and consistent.

No metrics show that difference.

-G

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-30B-A3B-Architect17-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
2
Safetensors
Model size
31B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3-30B-A3B-Architect17-qx86-hi-mlx