ModernBERT-base Fine-tuned on CoNLL-2003 for NER

This model is a fine-tuned version of answerdotai/ModernBERT-base on the CoNLL-2003 dataset for Named Entity Recognition (NER).

ModernBERT's architecture allows for efficient processing of long sequences and features optimized attention mechanisms, making it an excellent backbone for dense token-classification tasks like NER.

Model Description

Developed by: Rúben Garrido
Model type: ModernBERT (Encoder-only Transformer)
Task: Named Entity Recognition (NER)
Labels: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, B-MISC, I-MISC

Intended Uses & Limitations

This model is intended for identifying entities (Persons, Organizations, Locations, and Miscellaneous) in English text.

How to use

from transformers import pipeline

ner_pipeline = pipeline("ner", model="RGarrido03/modernbert-conll2003-ner-base", aggregation_strategy="simple")
text = "The CERN headquarters are located in Geneva, Switzerland."
results = ner_pipeline(text)

for entity in results:
    print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']:.4f}")

Training Data

The model was trained on the CoNLL-2003 dataset, which consists of Reuters news stories from 1996 and 1997.

Train samples: 14,041
Validation samples: 3,250
Test samples: 3,453

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Learning rate: 5e-5 (with AdamW optimizer)
Batch size: 8
Epochs: 3.0
Weight decay: 0.01
Warmup ratio: 0.1
Max sequence length: 256
Label all tokens: True (subword pieces inherit parent labels)

Training Results (Evaluation on Test Split)

Metric	Value
Accuracy	0.9711
F1 Score	0.8851
Precision	0.8721
Recall	0.8985
Loss	0.1873

Evaluation on Validation Split

Metric	Value
Accuracy	0.9871
F1 Score	0.9416
Precision	0.9357
Recall	0.9475
Loss	0.0625

Environmental Impact

Runtime: ~11.5 minutes (694 seconds)
Hardware: MacBook Pro, M5 Pro 24GB (Training speed: ~62 samples/sec)

Citation

If you use this model, please cite the original CoNLL-2003 paper and the ModernBERT work.

@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
    title = "Introduction to the {CoNLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
    author = "Tjong Kim Sang, Erik F.  and De Meulder, Fien",
    booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
    year = "2003",
    url = "https://aclanthology.org/W03-0419",
    pages = "142--147",
}

Downloads last month: 34

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for RGarrido03/modernbert-conll2003-ner-base

Base model

answerdotai/ModernBERT-base

Finetuned

(1220)

this model

Dataset used to train RGarrido03/modernbert-conll2003-ner-base

Evaluation results

Precision on CoNLL-2003
test set self-reported

0.872
Recall on CoNLL-2003
test set self-reported

0.898
F1 on CoNLL-2003
test set self-reported

0.885
Accuracy on CoNLL-2003
test set self-reported

0.971