zgrgr nielsr HF Staff commited on
Commit
0871eba
·
verified ·
1 Parent(s): 0574e45

Add library_name and link to paper/code (#1)

Browse files

- Add library_name and link to paper/code (affd556d87f5445e9df569ac9445048974a9a501)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -1,8 +1,15 @@
1
  ---
 
 
 
 
 
2
  language:
3
  - tr
4
  - en
5
  license: apache-2.0
 
 
6
  tags:
7
  - reward-model
8
  - turkish
@@ -14,18 +21,16 @@ tags:
14
  - evaluation
15
  - TRUBA
16
  - MN5
17
- base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
18
- pipeline_tag: text-classification
19
- datasets:
20
- - newmindai/armo-ultrafeedback-dataset
21
- - newmindai/armo-pair-dataset
22
- - newmindai/armo-dataset
23
  ---
24
 
25
  # Muhakim (ArmoRM-Turkish-Legal)
26
 
27
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
28
 
 
 
 
 
29
  ## Model Description
30
 
31
  Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
@@ -139,7 +144,8 @@ user_message = "Sözleşme feshi nasıl yapılır? [Legal context here]"
139
  assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
140
 
141
  # Format for reward model (conversational format)
142
- text = f"User: {user_message}\nAssistant: {assistant_response}"
 
143
  inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
144
 
145
  # Get reward score
@@ -185,7 +191,7 @@ The numerical calculations reported in this work were fully/partially performed
185
  ```bibtex
186
  @article{mecellem2026,
187
  title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
188
- author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin, İclal and Sağbaş, Ömer Can},
189
  journal={arXiv preprint arXiv:2601.16018},
190
  year={2026},
191
  month={January},
@@ -218,8 +224,8 @@ The numerical calculations reported in this work were fully/partially performed
218
 
219
  ## License
220
 
221
- This dataset is released under the Apache 2.0 License.
222
 
223
  ## Contact
224
 
225
- For questions: [info@newmind.ai](mailto:info@newmind.ai)
 
1
  ---
2
+ base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
3
+ datasets:
4
+ - newmindai/armo-ultrafeedback-dataset
5
+ - newmindai/armo-pair-dataset
6
+ - newmindai/armo-dataset
7
  language:
8
  - tr
9
  - en
10
  license: apache-2.0
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  tags:
14
  - reward-model
15
  - turkish
 
21
  - evaluation
22
  - TRUBA
23
  - MN5
 
 
 
 
 
 
24
  ---
25
 
26
  # Muhakim (ArmoRM-Turkish-Legal)
27
 
28
  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
29
 
30
+ This model is part of the **Mecellem** project, presented in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
31
+
32
+ **GitHub Repository**: [newmindai/mecellem-models](https://github.com/newmindai/mecellem-models)
33
+
34
  ## Model Description
35
 
36
  Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
 
144
  assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
145
 
146
  # Format for reward model (conversational format)
147
+ text = f"User: {user_message}
148
+ Assistant: {assistant_response}"
149
  inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
150
 
151
  # Get reward score
 
191
  ```bibtex
192
  @article{mecellem2026,
193
  title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
194
+ author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, Ömer Can Sağbaş},
195
  journal={arXiv preprint arXiv:2601.16018},
196
  year={2026},
197
  month={January},
 
224
 
225
  ## License
226
 
227
+ This project is released under the Apache 2.0 License.
228
 
229
  ## Contact
230
 
231
+ For questions: [info@newmind.ai](mailto:info@newmind.ai)