Add library_name and link to paper/code (#1)
Browse files- Add library_name and link to paper/code (affd556d87f5445e9df569ac9445048974a9a501)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,8 +1,15 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- tr
|
| 4 |
- en
|
| 5 |
license: apache-2.0
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
- reward-model
|
| 8 |
- turkish
|
|
@@ -14,18 +21,16 @@ tags:
|
|
| 14 |
- evaluation
|
| 15 |
- TRUBA
|
| 16 |
- MN5
|
| 17 |
-
base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
| 18 |
-
pipeline_tag: text-classification
|
| 19 |
-
datasets:
|
| 20 |
-
- newmindai/armo-ultrafeedback-dataset
|
| 21 |
-
- newmindai/armo-pair-dataset
|
| 22 |
-
- newmindai/armo-dataset
|
| 23 |
---
|
| 24 |
|
| 25 |
# Muhakim (ArmoRM-Turkish-Legal)
|
| 26 |
|
| 27 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Model Description
|
| 30 |
|
| 31 |
Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
|
|
@@ -139,7 +144,8 @@ user_message = "Sözleşme feshi nasıl yapılır? [Legal context here]"
|
|
| 139 |
assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
|
| 140 |
|
| 141 |
# Format for reward model (conversational format)
|
| 142 |
-
text = f"User: {user_message}
|
|
|
|
| 143 |
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
|
| 144 |
|
| 145 |
# Get reward score
|
|
@@ -185,7 +191,7 @@ The numerical calculations reported in this work were fully/partially performed
|
|
| 185 |
```bibtex
|
| 186 |
@article{mecellem2026,
|
| 187 |
title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
|
| 188 |
-
author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin,
|
| 189 |
journal={arXiv preprint arXiv:2601.16018},
|
| 190 |
year={2026},
|
| 191 |
month={January},
|
|
@@ -218,8 +224,8 @@ The numerical calculations reported in this work were fully/partially performed
|
|
| 218 |
|
| 219 |
## License
|
| 220 |
|
| 221 |
-
This
|
| 222 |
|
| 223 |
## Contact
|
| 224 |
|
| 225 |
-
For questions: [info@newmind.ai](mailto:info@newmind.ai)
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
|
| 3 |
+
datasets:
|
| 4 |
+
- newmindai/armo-ultrafeedback-dataset
|
| 5 |
+
- newmindai/armo-pair-dataset
|
| 6 |
+
- newmindai/armo-dataset
|
| 7 |
language:
|
| 8 |
- tr
|
| 9 |
- en
|
| 10 |
license: apache-2.0
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
library_name: transformers
|
| 13 |
tags:
|
| 14 |
- reward-model
|
| 15 |
- turkish
|
|
|
|
| 21 |
- evaluation
|
| 22 |
- TRUBA
|
| 23 |
- MN5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
---
|
| 25 |
|
| 26 |
# Muhakim (ArmoRM-Turkish-Legal)
|
| 27 |
|
| 28 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 29 |
|
| 30 |
+
This model is part of the **Mecellem** project, presented in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
|
| 31 |
+
|
| 32 |
+
**GitHub Repository**: [newmindai/mecellem-models](https://github.com/newmindai/mecellem-models)
|
| 33 |
+
|
| 34 |
## Model Description
|
| 35 |
|
| 36 |
Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
|
|
|
|
| 144 |
assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
|
| 145 |
|
| 146 |
# Format for reward model (conversational format)
|
| 147 |
+
text = f"User: {user_message}
|
| 148 |
+
Assistant: {assistant_response}"
|
| 149 |
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
|
| 150 |
|
| 151 |
# Get reward score
|
|
|
|
| 191 |
```bibtex
|
| 192 |
@article{mecellem2026,
|
| 193 |
title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
|
| 194 |
+
author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, Ömer Can Sağbaş},
|
| 195 |
journal={arXiv preprint arXiv:2601.16018},
|
| 196 |
year={2026},
|
| 197 |
month={January},
|
|
|
|
| 224 |
|
| 225 |
## License
|
| 226 |
|
| 227 |
+
This project is released under the Apache 2.0 License.
|
| 228 |
|
| 229 |
## Contact
|
| 230 |
|
| 231 |
+
For questions: [info@newmind.ai](mailto:info@newmind.ai)
|