EuroLLM-22B-Instruct GPTQ (4-bit)
This is a 4-bit GPTQ quantized version of utter-project/EuroLLM-22B-Instruct-2512.
Model Details
| Attribute | Value |
|---|---|
| Original Model | EuroLLM-22B-Instruct-2512 |
| Quantization | GPTQ 4-bit |
| Group Size | 128 |
| Activation Order | desc_act=True |
| Calibration Samples | 512 |
| Sequence Length | 1024 |
| Calibration Dataset | OpenHermes-2.5 (ChatML format) |
| Size | ~14GB |
Quantization Configuration
{
"bits": 4,
"group_size": 128,
"desc_act": true,
"damp_percent": 0.01,
"sym": true,
"true_sequential": true
}
Quantization Quality
The quantization achieved excellent loss values across all 53 transformer layers:
- Attention projections: 10⁻⁵ to 10⁻⁶ range
- MLP layers: 10⁻⁵ range
- No error explosion or instability
Languages Supported
EuroLLM supports all 24 official EU languages plus additional European languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian
Usage
With GPTQModel (Recommended)
from gptqmodel import GPTQModel
from transformers import AutoTokenizer
model = GPTQModel.load("Euraika/EuroLLM-22B-Instruct-GPTQ", device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained("Euraika/EuroLLM-22B-Instruct-GPTQ")
prompt = """<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Explain quantum computing in simple terms.<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="Euraika/EuroLLM-22B-Instruct-GPTQ", quantization="gptq")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=256)
outputs = llm.generate(["Your prompt here"], sampling_params)
With Transformers + AutoGPTQ
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Euraika/EuroLLM-22B-Instruct-GPTQ",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Euraika/EuroLLM-22B-Instruct-GPTQ")
Chat Template
This model uses the ChatML format:
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>
Hardware Requirements
- Minimum: 16GB VRAM (RTX 4080, RTX A4000)
- Recommended: 24GB VRAM (RTX 4090, RTX A5000, RTX 6000)
- Supports multi-GPU inference via device_map="auto"
Credits
- Original Model: UTTER Project - EuroLLM team
- Quantization: Euraika using GPTQModel v5.6.12
- Calibration Dataset: OpenHermes-2.5
License
Apache 2.0 (same as base model)
- Downloads last month
- 779
Model tree for Euraika/EuroLLM-22B-Instruct-GPTQ
Base model
utter-project/EuroLLM-22B-2512
Finetuned
utter-project/EuroLLM-22B-Instruct-2512