Q3_K_M (112 GB) is bigger than Q3_K_XL (104 GB)?
#8
by
rtzurtz - opened
as per title
Yes that;s correct. K_XL is usually smaller
But what are the implications?
Cos I have strix halo 128 GB, so can run Q3_K_XL at 100 GB or Q3_K_M at 115 GB, but what's the difference in perplexity or benchmarks?
Couldn't we have a UD K_XL which sits between them?
There's 20 GB+ of 'free real estate' on the device.
I would also like to know which one would be best to use for a 128 gb strix halo
You'd be better off on Qwen 3.5 now.
They're newer, and faster potentially, due to having fewer active parameters.
Try 122B at IQ4_XS or even 397B IQ2_XSS smol.