modernbert-unfair-tos
ModernBERT fine-tuned for UNFAIR-ToS classification
Model Description
This model is fine-tuned on the LexGLUE UNFAIR-ToS dataset to detect unfair clauses in Terms of Service documents.
Base Model: answerdotai/ModernBERT-base
Performance
| Metric | Score |
|---|---|
| Exact Match Accuracy | 70.6% |
| Micro-F1 | 0.79 |
| Precision | 0.98 |
Risk Categories
The model classifies text into 8 risk categories:
| ID | Category |
|---|---|
| 0 | Limitation of liability |
| 1 | Unilateral termination |
| 2 | Unilateral change |
| 3 | Content removal |
| 4 | Contract by using |
| 5 | Choice of law |
| 6 | Jurisdiction |
| 7 | Arbitration |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Agreemind/modernbert-unfair-tos"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "We reserve the right to terminate your account at any time."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)
# Get predictions
labels = ["Limitation of liability", "Unilateral termination", "Unilateral change",
"Content removal", "Contract by using", "Choice of law", "Jurisdiction", "Arbitration"]
for label, prob in zip(labels, probs[0]):
if prob > 0.5:
print(f"{label}: {prob:.2%}")
Training
- Dataset: LexGLUE UNFAIR-ToS (~5,500 samples)
- Loss: Focal Loss with class weighting
- Optimizer: AdamW with cosine LR schedule
- Epochs: 15 (with early stopping)
Limitations
- Arbitration class has lower recall (~38%) due to limited training samples
- Optimized for English legal text
Citation
@misc{agreemind-unfair-tos,
author = {Agreemind},
title = {modernbert-unfair-tos},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/Agreemind/modernbert-unfair-tos}
}
- Downloads last month
- 9
Model tree for Agreemind/modernbert-unfair-tos
Base model
answerdotai/ModernBERT-base