EuroLLM-22B-Instruct GPTQ (4-bit)

This is a 4-bit GPTQ quantized version of utter-project/EuroLLM-22B-Instruct-2512.

Model Details

Attribute Value
Original Model EuroLLM-22B-Instruct-2512
Quantization GPTQ 4-bit
Group Size 128
Activation Order desc_act=True
Calibration Samples 512
Sequence Length 1024
Calibration Dataset OpenHermes-2.5 (ChatML format)
Size ~14GB

Quantization Configuration

{
  "bits": 4,
  "group_size": 128,
  "desc_act": true,
  "damp_percent": 0.01,
  "sym": true,
  "true_sequential": true
}

Quantization Quality

The quantization achieved excellent loss values across all 53 transformer layers:

  • Attention projections: 10⁻⁵ to 10⁻⁶ range
  • MLP layers: 10⁻⁵ range
  • No error explosion or instability

Languages Supported

EuroLLM supports all 24 official EU languages plus additional European languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian

Usage

With GPTQModel (Recommended)

from gptqmodel import GPTQModel
from transformers import AutoTokenizer

model = GPTQModel.load("Euraika/EuroLLM-22B-Instruct-GPTQ", device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained("Euraika/EuroLLM-22B-Instruct-GPTQ")

prompt = """<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Explain quantum computing in simple terms.<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="Euraika/EuroLLM-22B-Instruct-GPTQ", quantization="gptq")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=256)

outputs = llm.generate(["Your prompt here"], sampling_params)

With Transformers + AutoGPTQ

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Euraika/EuroLLM-22B-Instruct-GPTQ",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Euraika/EuroLLM-22B-Instruct-GPTQ")

Chat Template

This model uses the ChatML format:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>

Hardware Requirements

  • Minimum: 16GB VRAM (RTX 4080, RTX A4000)
  • Recommended: 24GB VRAM (RTX 4090, RTX A5000, RTX 6000)
  • Supports multi-GPU inference via device_map="auto"

Credits

License

Apache 2.0 (same as base model)

Downloads last month
779
Safetensors
Model size
23B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Euraika/EuroLLM-22B-Instruct-GPTQ