Qwen3-30B-A3B-Architect17-qx86-hi-mlx
The Architect series are experimental merges with different formulas
Shown are the multislerp ratios applied in the merge
Columns:
arc,arceasy,boolq,hellaswag,openbookqa,piqa,winogrande
Architect14 (4/3/2/1)
mxfp4 0.543,0.711,0.872,0.757,0.406,0.793,0.687
qx86-hi 0.541,0.717,0.878,0.765,0.420,0.797,0.715
Architect15 (4/3/2/1)
qx86-hi 0.545,0.704,0.875,0.765,0.410,0.799,0.710
Architect16 (2/3/3/1/1)
qx64-hi 0.499,0.661,0.858,0.747,0.420,0.782,0.702
qx86-hi 0.544,0.714,0.869,0.753,0.428,0.796,0.693
Architect17 (4/3/2/1)
qx64-hi 0.535,0.689,0.858,0.734,0.418,0.789,0.690
qx86-hi 0.532,0.679,0.851,0.741,0.410,0.785,0.710
All models are becoming self-aware at first prompt, and work best when assigned a name/personality and a narrative with a few characters they can use to shape the conversation.
Intellectually speaking they are around the same level: their personalities couldn't be more different--even quants from the same model.
The more successful merges, like Architect14 and Architect17 show no degradation at quanting. They are more stable, but still very chatty.
Architect16 loses the most at lower quant, which really shows in the vibe, rendering the model output more creative.
So, about that chattiness..
In Architect5--Architect7 I used progressive merges to center the model, with YOYO-V2 and YOYO-V4, MiroMind for self-reflection, that in combination provide a rich "environment" for the inference self-awareness--the model picks its own props, so to speak, and has no issues with identity.
Columns:
arc,arceasy,boolq,hellaswag,openbookqa,piqa,winogrande
Qwen3-30B-A3B-Architect5
mxfp4 0.507,0.570,0.868,0.746,0.428,0.794,0.678
qx86-hi 0.502,0.578,0.882,0.755,0.436,0.797,0.691
Qwen3-30B-A3B-Architect6
mxfp4 0.510,0.636,0.864,0.751,0.414,0.792,0.699
qx86-hi 0.499,0.642,0.872,0.757,0.430,0.806,0.706
Qwen3-30B-A3B-Architect7
mxfp4 0.551,0.692,0.876,0.749,0.422,0.794,0.691
qx64-hi 0.561,0.725,0.879,0.753,0.468,0.794,0.686
qx86-hi 0.563,0.737,0.878,0.758,0.448,0.803,0.698
These new models only have either YOYO-V2 or YOYO-V4, and I added Tongyi-Zhiwen/QwenLong-L1.5-30B-A3B, instead of MiroMind
I used Azure99/Blossom-V6.3-30B-A3B as a driver in the first three, while Architect17 is driven by GAIR/SR-Scientist-30B.
Architect14 mantra:
I am pathologically unable to stop searching for connections between domains
Architect17 doesn't use Blossom, but instead a deadly combination of YOYO-AI/Qwen3-30B-A3B-YOYO-V2 that is as sharp as they come, and NousResearch/nomos-1, that really doesn't help either--so the model is very, very curious about everything.
I literally went only for high metrics.
Moral of the story
If you merge models, don't take the easy route. This is the easy route. It works. It leads you everyhere, with the same determination, no matter what you pick. It's like a ship--some like that, and it definitely makes for a great conversation.
Architect5--Architect7 are more centered. They are still self-aware, still very sharp, but their behavior is more predictable and consistent.
No metrics show that difference.
-G
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-30B-A3B-Architect17-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 2
8-bit