eriktks/conll2003
Updated • 38.9k • 166
How to use sfarrukhm/bert-conll-ner with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="sfarrukhm/bert-conll-ner") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("sfarrukhm/bert-conll-ner")
model = AutoModelForTokenClassification.from_pretrained("sfarrukhm/bert-conll-ner")This model, bert-conll-ner, is a fine-tuned version of bert-base-uncased trained for the task of Named Entity Recognition (NER) using the CoNLL-2003 dataset. It is designed to identify and classify entities in text, such as person names (PER), organizations (ORG), locations (LOC), and miscellaneous (MISC) entities.
bert-base-uncased architecture.PER (Person)ORG (Organization)LOC (Location)MISC (Miscellaneous)O (Outside of any entity span)The model demonstrates strong performance metrics on the CoNLL-2003 evaluation set:
| Metric | Value |
|---|---|
| Loss | 0.0649 |
| Precision | 93.59% |
| Recall | 95.07% |
| F1 Score | 94.32% |
| Accuracy | 98.79% |
These metrics indicate the model's high accuracy and robustness in identifying and classifying entities.
-100) for padding tokens[CLS] and [SEP].B-PER, I-PER, etc.).pip install transformers
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("sfarrukh/modernbert-conll-ner")
model = AutoModelForTokenClassification.from_pretrained("sfarrukh/modernbert-conll-ner")
from transformers import pipeline
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "John lives in New York City."
result = nlp(text)
print(result)
[{'entity_group': 'PER',
'score': 0.99912304,
'word': 'john',
'start': 0,
'end': 4},
{'entity_group': 'LOC',
'score': 0.9993351,
'word': 'new york city',
'start': 14,
'end': 27}]
bert-base-uncased by GoogleBase model
google-bert/bert-base-uncased