newmindai
/

Muhakim

- Add library_name and link to paper/code (affd556d87f5445e9df569ac9445048974a9a501)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +16 -10

README.md CHANGED Viewed

@@ -1,8 +1,15 @@
 ---
 language:
 - tr
 - en
 license: apache-2.0
 tags:
 - reward-model
 - turkish
@@ -14,18 +21,16 @@ tags:
 - evaluation
 - TRUBA
 - MN5
-base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
-pipeline_tag: text-classification
-datasets:
-- newmindai/armo-ultrafeedback-dataset
-- newmindai/armo-pair-dataset
-- newmindai/armo-dataset
 ---
 # Muhakim (ArmoRM-Turkish-Legal)
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 ## Model Description
 Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
@@ -139,7 +144,8 @@ user_message = "Sözleşme feshi nasıl yapılır? [Legal context here]"
 assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
 # Format for reward model (conversational format)
-text = f"User: {user_message}\nAssistant: {assistant_response}"
 inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
 # Get reward score
@@ -185,7 +191,7 @@ The numerical calculations reported in this work were fully/partially performed
 ```bibtex
 @article{mecellem2026,
   title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
-  author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and Çetin, İclal and Sağbaş, Ömer Can},
   journal={arXiv preprint arXiv:2601.16018},
   year={2026},
   month={January},
@@ -218,8 +224,8 @@ The numerical calculations reported in this work were fully/partially performed
 ## License
-This dataset is released under the Apache 2.0 License.
 ## Contact
-For questions: [info@newmind.ai](mailto:info@newmind.ai)

 ---
+base_model: Skywork/Skywork-Reward-Llama-3.1-8B-v0.2
+datasets:
+- newmindai/armo-ultrafeedback-dataset
+- newmindai/armo-pair-dataset
+- newmindai/armo-dataset
 language:
 - tr
 - en
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 tags:
 - reward-model
 - turkish
 - evaluation
 - TRUBA
 - MN5
 ---
 # Muhakim (ArmoRM-Turkish-Legal)
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+This model is part of the **Mecellem** project, presented in the paper [Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain](https://huggingface.co/papers/2601.16018).
+**GitHub Repository**: [newmindai/mecellem-models](https://github.com/newmindai/mecellem-models)
 ## Model Description
 Muhakim (ArmoRM-Turkish-Legal) is a domain-specific multi-objective reward model trained for Turkish legal text assessment. Built upon the Skywork-Reward-V2-Llama-3.1-8B backbone (8B parameters) and augmented with a mixture-of-experts gating mechanism, the model produces fine-grained quality scores across five legally grounded dimensions. The training pipeline consists of three components: (i) multi-objective supervision that enables independent learning of five legal quality dimensions, (ii) preference-based training of a mixture-of-experts gating network to capture context-dependent importance of these dimensions, and (iii) a debiasing stage designed to mitigate length-related reward artifacts.
 assistant_response = "Sözleşme feshi yazılı bildirimle yapılabilir..."
 # Format for reward model (conversational format)
+text = f"User: {user_message}
+Assistant: {assistant_response}"
 inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
 # Get reward score
 ```bibtex
 @article{mecellem2026,
   title={Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain},
+  author={Uğur, Özgür and Göksu, Mahmut and Çimen, Mahmut and Yılmaz, Musa and Şavirdi, Esra and Demir, Alp Talha and Güllüce, Rumeysa and İclal Çetin, Ömer Can Sağbaş},
   journal={arXiv preprint arXiv:2601.16018},
   year={2026},
   month={January},
 ## License
+This project is released under the Apache 2.0 License.
 ## Contact
+For questions: [info@newmind.ai](mailto:info@newmind.ai)