Bielik-Minitron-7B-v3.0-Instruct
Bielik-Minitron-7B-v3.0-Instruct is a generative text model featuring 7.35 billion parameters. It is an instruct-aligned version of the pruned Bielik-11B-v3-Base-20250730 model. This model stands as a testament to a unique trilateral collaboration between the open-science project SpeakLeash, the High Performance Computing (HPC) center ACK Cyfronet AGH, and NVIDIA.
By leveraging NVIDIA's Minitron methodology, the team employed the NVIDIA Model Optimizer and the NVIDIA NeMo Framework to execute a sophisticated two-stage compression process involving structured pruning and knowledge distillation. This technical synergy allowed for a significant reduction in parameter count while maintaining high-tier performance benchmarks.
Developed and trained on a massive multilingual corpus spanning 32 European languages, with a specific emphasis on Polish data curated by the SpeakLeash team, this project utilizes Poland's large-scale computing infrastructure within the PLGrid environment. The training was conducted on the Athena and Helios supercomputers at ACK Cyfronet AGH, supported by computational grant PLG/2024/016951. This access to cutting-edge NVIDIA hardware and software resources was essential for the complex machine learning processes required to produce a model of this scale. As a result, the model exhibits an exceptional ability to process Polish and other European languages, providing accurate responses and performing complex linguistic tasks with high precision and significantly improved inference speed.
📚 Technical report: Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
Model
The model is a compressed 7.35B parameter version of the Bielik 11B v3 model, specifically optimized for European languages. Leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4% (from 11.04B to 7.35B). We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation to facilitate quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning via Group Relative Policy Optimization (GRPO). The final model recovered approximately 90% of the baseline model's performance while providing up to a 50% inference speedup.
The post-distillation pipeline was designed to mirror the treatment of the Bielik-11B-v3.0-Instruct model, ensuring that efficiency gains from pruning did not come at the expense of instruction-following precision or safety.
Supervised Fine-Tuning and Preference Alignment (DPO-P) stages were conducted using ALLaMo, an original open-source framework implemented by Krzysztof Ociepa. This framework allows users to train language models with architectures similar to LLaMA and Mistral in a fast and efficient way.
The Reinforcement Learning stage utilized Group Relative Policy Optimization (GRPO) and its variant, Dr. GRPO; these were chosen to improve token efficiency by reducing the tendency of models to artificially increase response length to maximize rewards. RL training was executed using the Volcano Engine Reinforcement Learning (VERL) framework, providing a scalable and modular environment. The training corpus contained curated problems spanning logic, STEM, mathematics, and tool-use domains. All samples were selected based on the availability of Reinforcement Learning from Verifiable Rewards (RLVR), ensuring that each problem had a definitive, verifiable solution.
Model description:
- Developed by: SpeakLeash & ACK Cyfronet AGH
- Language: Multilingual (32 European languages, optimized for Polish)
- Model type: causal decoder-only
- Pruned and finetuned from: Bielik-11B-v3-Base-20250730
- License: Apache 2.0
Chat template
Bielik-Minitron-7B-v3.0-Instruct uses ChatML as the prompt format.
E.g.
prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|> \n"
This format is available as a chat template via the apply_chat_template() method:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model_name = "speakleash/Bielik-Minitron-7B-v3.0-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
messages = [
{"role": "system", "content": "Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim."},
{"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
{"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima."},
{"role": "user", "content": "Która jest najcieplejsza?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = input_ids.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
Fully formated input conversation by apply_chat_template from previous example:
<s><|im_start|> system
Odpowiadaj krótko, precyzyjnie i wyłącznie w języku polskim.<|im_end|>
<|im_start|> user
Jakie mamy pory roku w Polsce?<|im_end|>
<|im_start|> assistant
W Polsce mamy 4 pory roku: wiosna, lato, jesień i zima.<|im_end|>
<|im_start|> user
Która jest najcieplejsza?<|im_end|>
Limitations and Biases
Bielik-Minitron-7B-v3.0-Instruct is a quick demonstration that the base model can be easily fine-tuned to achieve compelling and promising performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community in ways to make the model respect guardrails, allowing for deployment in environments requiring moderated outputs.
Bielik-Minitron-7B-v3.0-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-Minitron-7B-v3.0-Instruct was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.
Responsible for training the model
- Remigiusz KinasBielik.AI - conceptualizing, coordinating RL trainings, data preparation, benchmarking and quantizations
- Paweł KiszczakBielik.AI - pruning and distillation supervision
- Sergio P. PerezNVIDIA - conceptualizing, benchmarking and quantizations
- Krzysztof OciepaBielik.AI - team leadership, conceptualizing, data preparation, process optimization and oversight of training
- Łukasz FlisCyfronet AGH - coordinating and supervising the training
- Adrian GwoździejBielik.AI - data preparation and ensuring data quality
- Krzysztof WróbelBielik.AI - benchmarks
The model could not have been created without the commitment and work of the entire SpeakLeash team, whose contribution is invaluable. Thanks to the hard work of many individuals, it was possible to gather a large amount of content in Polish and establish collaboration between the open-science SpeakLeash project and the HPC center: ACK Cyfronet AGH. Individuals who contributed to the creation of the model: Sebastian Kondracki, Marek Magryś, Igor Ciuciura, Szymon Baczyński, Dominika Basaj, Kuba Sołtys, Karol Jezierski, Jan Sowa, Anna Przybył, Agnieszka Ratajska, Witold Wydmański.
We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2024/016951.
Legal Aspects
EU AI Act Transparency Documentation: Bielik 11B v3 EU Public Summary.pdf
Data Protection and Copyright Requests
For removal requests of personally identifiable information (PII) or of copyrighted content, please contact the respective dataset owners or us directly: biuro@speakleash.org.pl.
Citation
Please cite this model using the following format:
@misc{kinas2026bielikminitron7bcompressinglargelanguage,
title={Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language},
author={Remigiusz Kinas and Paweł Kiszczak and Sergio P. Perez and Krzysztof Ociepa and Łukasz Flis and Krzysztof Wróbel and Adrian Gwoździej},
year={2026},
eprint={2603.11881},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.11881},
}
@misc{BielikMinitron7Bv3i,
title = {Bielik-Minitron-7B-v3.0-Instruct model card},
author = {Kinas, Remigiusz and Paweł, Kiszczak and Perez, Sergio and Ociepa, Krzysztof and Flis, Łukasz and Gwoździej, Adrian and Wróbel, Krzysztof and {Bielik.AI Team} and {Cyfronet Team} and {NVIDIA Team}},
year = {2026},
url = {https://huggingface.co/speakleash/Bielik-Minitron-7B-v3.0-Instruct},
note = {Accessed: 2026-03-19}, % change this date
urldate = {2026-03-19} % change this date
}
Contact Us
If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our Discord SpeakLeash.
- Downloads last month
- 109
Model tree for speakleash/Bielik-Minitron-7B-v3.0-Instruct
Base model
speakleash/Bielik-11B-v3-Base-20250730