Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

cyankiwi
/
GLM-4.5-Air-AWQ-4bit

Text Generation
Transformers
Safetensors
English
Chinese
glm4_moe
conversational
compressed-tensors
Model card Files Files and versions
xet
Community
12
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

KV Cache per token and Doubling the context size

3
#12 opened about 2 months ago by
HenkTenk

Running on 4 GPUs with TP=4

3
#11 opened 3 months ago by
nephepritou

Running on 6 GPUs

🤗 1
4
#10 opened 5 months ago by
0xSero

Thank you and a couple QQs

2
#9 opened 6 months ago by
Ewere

request for fp4 quants

#8 opened 6 months ago by
hareram241

Improved useability

2
#7 opened 6 months ago by
HenkTenk

Model for 8 gpus

6
#6 opened 7 months ago by
ilwoonam75

Any chance for more GLM quants?

2
#4 opened 7 months ago by
koute

Please make one for the larger Non Air Variant

3
#2 opened 8 months ago by
chriswritescode

Does this actually work with VLLM?

32
#1 opened 8 months ago by
sirus
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs