Q4 quant
Hi,
I recently found your quants.
I would much appreciate the Q4_XXL for this particular (or any other Qwen3.5-27B) model.
π
Do you have more then 24GB VRAM available? Because Q4_K_XXL would be 23GB with the same setting as Q6 is done.
Do you have more then 24GB VRAM available? Because Q4_K_XXL would be 23GB with the same setting as Q6 is done.
I have exactly 24GB VRAM and 64GB RAM. Q6 is just too much. Q4 would be ideal, as I can tune --n-gpu-layers to utilize my VRAM exactly.
I'm currently in a hunt of some improved Qwen3.5-27B at Q4, and I really like your XXL quants.
Deppending on what you need LLM for those quants may be a hit or a miss. I find them better performers when working with context, they feel smarter, however I still have problems finding best recipe for given model.
I made several quants of this model with different settings for attension and ffn_down weights, and find out, in my limited testing, that quant I made one of the first tries, that omitted setting attn_output in higher precission in fact performs better then quants with more precission for this weight made letter... also setting all attension weights to bf16 not nescesserly improved model performance. This is so wired.
But again, my testing is a bit too limited. They sure work for me however :P and if I find better recipe latter, I eventually reupload newer version, so do not be supprised if I remove this repo and reupload later :P
BTW, uploading Q4. It is not that extreme setting attention layers like some my other quants, and maybe setting bf16 is not always the way when making smaller quants... should be fine... or you tell me?
I have slow upload so it will take ~day.
BTW2. I don't normally make quants when someone asks me, but I still have full precission weights of this model and still plan to test some recipes to see if I can find better setting for weights... so you are lucky with this request :P
Thank you very much. Sorry for late response. Downloaded weights and they seem to be better than "raw" Unsloth's ones.