KV Cache per token and Doubling the context size
3
#12 opened about 2 months ago
by
HenkTenk
Running on 4 GPUs with TP=4
3
#11 opened 3 months ago
by
nephepritou
Running on 6 GPUs
🤗 1
4
#10 opened 5 months ago
by
0xSero
Thank you and a couple QQs
2
#9 opened 6 months ago
by
Ewere
request for fp4 quants
#8 opened 6 months ago
by
hareram241
Improved useability
2
#7 opened 6 months ago
by
HenkTenk
Model for 8 gpus
6
#6 opened 7 months ago
by
ilwoonam75
Any chance for more GLM quants?
2
#4 opened 7 months ago
by
koute
Please make one for the larger Non Air Variant
3
#2 opened 8 months ago
by
chriswritescode
Does this actually work with VLLM?
32
#1 opened 8 months ago
by
sirus