You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Bielik-Minitron-7B-v3.0-Instruct

Bielik-Minitron-7B-v3.0-Instruct is a generative text model featuring 7.35 billion parameters. It is an instruct-aligned version of the pruned Bielik-11B-v3-Base-20250730 model. This model stands as a testament to a unique trilateral collaboration between the open-science project SpeakLeash, the High Performance Computing (HPC) center ACK Cyfronet AGH, and NVIDIA.

By leveraging NVIDIA's Minitron methodology, the team employed the NVIDIA Model Optimizer and the NVIDIA NeMo Framework to execute a sophisticated two-stage compression process involving structured pruning and knowledge distillation. This technical synergy allowed for a significant reduction in parameter count while maintaining high-tier performance benchmarks.

Developed and trained on a massive multilingual corpus spanning 32 European languages, with a specific emphasis on Polish data curated by the SpeakLeash team, this project utilizes Poland's large-scale computing infrastructure within the PLGrid environment. The training was conducted on the Athena and Helios supercomputers at ACK Cyfronet AGH, supported by computational grant PLG/2024/016951. This access to cutting-edge NVIDIA hardware and software resources was essential for the complex machine learning processes required to produce a model of this scale. As a result, the model exhibits an exceptional ability to process Polish and other European languages, providing accurate responses and performing complex linguistic tasks with high precision and significantly improved inference speed.

📚 Technical report: Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Model

The model is a compressed 7.35B parameter version of the Bielik 11B v3 model, specifically optimized for European languages. Leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4% (from 11.04B to 7.35B). We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation to facilitate quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning via Group Relative Policy Optimization (GRPO). The final model recovered approximately 90% of the baseline model's performance while providing up to a 50% inference speedup.

The post-distillation pipeline was designed to mirror the treatment of the Bielik-11B-v3.0-Instruct model, ensuring that efficiency gains from pruning did not come at the expense of instruction-following precision or safety.

Supervised Fine-Tuning and Preference Alignment (DPO-P) stages were conducted using ALLaMo, an original open-source framework implemented by Krzysztof Ociepa. This framework allows users to train language models with architectures similar to LLaMA and Mistral in a fast and efficient way.

The Reinforcement Learning stage utilized Group Relative Policy Optimization (GRPO) and its variant, Dr. GRPO; these were chosen to improve token efficiency by reducing the tendency of models to artificially increase response length to maximize rewards. RL training was executed using the Volcano Engine Reinforcement Learning (VERL) framework, providing a scalable and modular environment. The training corpus contained curated problems spanning logic, STEM, mathematics, and tool-use domains. All samples were selected based on the availability of Reinforcement Learning from Verifiable Rewards (RLVR), ensuring that each problem had a definitive, verifiable solution.

Model description:

Developed by: SpeakLeash & ACK Cyfronet AGH
Language: Multilingual (32 European languages, optimized for Polish)
Model type: causal decoder-only
Pruned and finetuned from: Bielik-11B-v3-Base-20250730
License: Apache 2.0

Chat template

Bielik-Minitron-7B-v3.0-Instruct uses ChatML as the prompt format.

E.g.

prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"

This format is available as a chat template via the apply_chat_template() method:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model_name = "speakleash/Bielik-Minitron-7B-v3.0-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
    {"role": "user", "content": "Która jest najcieplejsza?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = input_ids.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Fully formated input conversation by apply_chat_template from previous example:

<s><|im_start|> system
Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|> 
<|im_start|> user
Jakie mamy pory roku w Polsce?<|im_end|> 
<|im_start|> assistant
W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> 
<|im_start|> user
Która jest najcieplejsza?<|im_end|>

Limitations and Biases

Bielik-Minitron-7B-v3.0-Instruct is a quick demonstration that the base model can be easily fine-tuned to achieve compelling and promising performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community in ways to make the model respect guardrails, allowing for deployment in environments requiring moderated outputs.

Bielik-Minitron-7B-v3.0-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-Minitron-7B-v3.0-Instruct was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.

Responsible for training the model

Remigiusz Kinas^Bielik.AI - conceptualizing, coordinating RL trainings, data preparation, benchmarking and quantizations
Paweł Kiszczak^Bielik.AI - pruning and distillation supervision
Sergio P. Perez^NVIDIA - conceptualizing, benchmarking and quantizations
Krzysztof Ociepa^Bielik.AI - team leadership, conceptualizing, data preparation, process optimization and oversight of training
Łukasz Flis^{Cyfronet AGH} - coordinating and supervising the training
Adrian Gwoździej^Bielik.AI - data preparation and ensuring data quality
Krzysztof Wróbel^Bielik.AI - benchmarks

The model could not have been created without the commitment and work of the entire SpeakLeash team, whose contribution is invaluable. Thanks to the hard work of many individuals, it was possible to gather a large amount of content in Polish and establish collaboration between the open-science SpeakLeash project and the HPC center: ACK Cyfronet AGH. Individuals who contributed to the creation of the model: Sebastian Kondracki, Marek Magryś, Igor Ciuciura, Szymon Baczyński, Dominika Basaj, Kuba Sołtys, Karol Jezierski, Jan Sowa, Anna Przybył, Agnieszka Ratajska, Witold Wydmański.

We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2024/016951.

Legal Aspects

EU AI Act Transparency Documentation: Bielik 11B v3 EU Public Summary.pdf

Data Protection and Copyright Requests

For removal requests of personally identifiable information (PII) or of copyrighted content, please contact the respective dataset owners or us directly: biuro@speakleash.org.pl.

Citation

Please cite this model using the following format:

@misc{kinas2026bielikminitron7bcompressinglargelanguage,
      title={Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language}, 
      author={Remigiusz Kinas and Paweł Kiszczak and Sergio P. Perez and Krzysztof Ociepa and Łukasz Flis and Krzysztof Wróbel and Adrian Gwoździej},
      year={2026},
      eprint={2603.11881},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.11881}, 
}
@misc{BielikMinitron7Bv3i,
    title     = {Bielik-Minitron-7B-v3.0-Instruct model card},
    author    = {Kinas, Remigiusz and Paweł, Kiszczak and Perez, Sergio and Ociepa, Krzysztof and Flis, Łukasz and Gwoździej, Adrian and Wróbel, Krzysztof and {Bielik.AI Team} and {Cyfronet Team} and {NVIDIA Team}},
    year      = {2026},
    url       = {https://huggingface.co/speakleash/Bielik-Minitron-7B-v3.0-Instruct},
    note      = {Accessed: 2026-03-19}, % change this date
    urldate   = {2026-03-19} % change this date
}

Contact Us

If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our Discord SpeakLeash.

Downloads last month: 109

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for speakleash/Bielik-Minitron-7B-v3.0-Instruct

Base model

speakleash/Bielik-11B-v3-Base-20250730

Finetuned

(3)

this model

Papers for speakleash/Bielik-Minitron-7B-v3.0-Instruct

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Paper • 2603.11881 • Published 11 days ago