Q3_K_M (112 GB) is bigger than Q3_K_XL (104 GB)?

#8
by rtzurtz - opened

as per title

Unsloth AI org

Yes that;s correct. K_XL is usually smaller

But what are the implications?

Cos I have strix halo 128 GB, so can run Q3_K_XL at 100 GB or Q3_K_M at 115 GB, but what's the difference in perplexity or benchmarks?

Couldn't we have a UD K_XL which sits between them?

There's 20 GB+ of 'free real estate' on the device.

I would also like to know which one would be best to use for a 128 gb strix halo

You'd be better off on Qwen 3.5 now.

They're newer, and faster potentially, due to having fewer active parameters.

Try 122B at IQ4_XS or even 397B IQ2_XSS smol.

Sign up or log in to comment