Instructions to use latincy/la_core_web_trf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- spaCy
How to use latincy/la_core_web_trf with spaCy:
!pip install https://huggingface.co/latincy/la_core_web_trf/resolve/main/la_core_web_trf-any-py3-none-any.whl # Using spacy.load(). import spacy nlp = spacy.load("la_core_web_trf") # Importing as module. import la_core_web_trf nlp = la_core_web_trf.load() - Notebooks
- Google Colab
- Kaggle
Commit ·
44eeba9
1
Parent(s): 9208ad6
v3.9.2: align release (token_fix + enclitic_splitter); add CHANGELOG
Browse filesConfig/packaging alignment across the LatinCy family; model weights unchanged.
CHANGELOG.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changelog
|
| 2 |
+
|
| 3 |
+
All notable changes to **la_core_web_trf** (the spaCy pipeline model) are documented here.
|
| 4 |
+
|
| 5 |
+
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); LatinCy uses a `v3.{model-generation}.{patch}` scheme.
|
| 6 |
+
|
| 7 |
+
## [3.9.2] - 2026-05-31
|
| 8 |
+
|
| 9 |
+
### Changed
|
| 10 |
+
- Enclitic splitting of *-que* is now handled by a dedicated `enclitic_splitter` pipeline component, decoupled from the tokenizer.
|
| 11 |
+
- Sentence segmentation upgraded to upstream `la_senter` v3.9.2 (passed through to this pipeline), which adds the `token_fix` component — keeping sentences intact across parentheticals, dash asides, and closing quotes.
|
| 12 |
+
|
| 13 |
+
### Notes
|
| 14 |
+
- Alignment release across the LatinCy family (`sm`/`md`/`lg`/`trf`). Model weights are unchanged from v3.9.x; no retraining.
|
| 15 |
+
|
| 16 |
+
## [3.9.0] - 2026-03-25
|
| 17 |
+
|
| 18 |
+
### Added
|
| 19 |
+
- Initial v3.9 public release: tagging, morphology, lemmatization, dependency parsing, NER, and sentence segmentation, trained on harmonized Universal Dependencies treebanks with LASLA data.
|
README.md
CHANGED
|
@@ -80,14 +80,25 @@ model-index:
|
|
| 80 |
| Feature | Description |
|
| 81 |
| --- | --- |
|
| 82 |
| **Name** | `la_core_web_trf` |
|
| 83 |
-
| **Version** | `3.
|
| 84 |
| **spaCy** | `>=3.8.3,<3.9.0` |
|
| 85 |
-
| **Default Pipeline** | `
|
| 86 |
-
| **Components** | `
|
| 87 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 88 |
| **Sources** | [CIRCSE/LASLA: LASLA Corpus](https://github.com/CIRCSE/LASLA/tree/v1.0.1)<br>[LatinCy NER](https://github.com/latincy/latincy-ner)<br>[UD_Latin-CIRCSE](https://github.com/UniversalDependencies/UD_Latin-CIRCSE)<br>[UD_Latin-ITTB](https://github.com/UniversalDependencies/UD_Latin-ITTB)<br>[UD_Latin-LLCT](https://github.com/UniversalDependencies/UD_Latin-LLCT)<br>[UD_Latin-Perseus](https://github.com/UniversalDependencies/UD_Latin-Perseus)<br>[UD_Latin-PROIEL](https://github.com/UniversalDependencies/UD_Latin-PROIEL)<br>[UD_Latin-UDante](https://github.com/UniversalDependencies/UD_Latin-UDante) |
|
| 89 |
| **License** | `MIT` |
|
| 90 |
-
| **Author** | [Patrick J. Burns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
### Label Scheme
|
| 93 |
|
|
@@ -126,4 +137,4 @@ model-index:
|
|
| 126 |
| `TAGGER_LOSS` | 60260.49 |
|
| 127 |
| `MORPHOLOGIZER_LOSS` | 447952.32 |
|
| 128 |
| `TRAINABLE_LEMMATIZER_LOSS` | 383152.85 |
|
| 129 |
-
| `PARSER_LOSS` | 3276429.69 |
|
|
|
|
| 80 |
| Feature | Description |
|
| 81 |
| --- | --- |
|
| 82 |
| **Name** | `la_core_web_trf` |
|
| 83 |
+
| **Version** | `3.9.2` |
|
| 84 |
| **spaCy** | `>=3.8.3,<3.9.0` |
|
| 85 |
+
| **Default Pipeline** | `enclitic_splitter`, `transformer`, `senter`, `token_fix`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `uv_normalizer`, `parser`, `harmonizer`, `remorpher`, `ner`, `trf_vectors` |
|
| 86 |
+
| **Components** | `enclitic_splitter`, `transformer`, `senter`, `token_fix`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `lookup_lemmatizer`, `uv_normalizer`, `parser`, `harmonizer`, `remorpher`, `ner`, `trf_vectors` |
|
| 87 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
| 88 |
| **Sources** | [CIRCSE/LASLA: LASLA Corpus](https://github.com/CIRCSE/LASLA/tree/v1.0.1)<br>[LatinCy NER](https://github.com/latincy/latincy-ner)<br>[UD_Latin-CIRCSE](https://github.com/UniversalDependencies/UD_Latin-CIRCSE)<br>[UD_Latin-ITTB](https://github.com/UniversalDependencies/UD_Latin-ITTB)<br>[UD_Latin-LLCT](https://github.com/UniversalDependencies/UD_Latin-LLCT)<br>[UD_Latin-Perseus](https://github.com/UniversalDependencies/UD_Latin-Perseus)<br>[UD_Latin-PROIEL](https://github.com/UniversalDependencies/UD_Latin-PROIEL)<br>[UD_Latin-UDante](https://github.com/UniversalDependencies/UD_Latin-UDante) |
|
| 89 |
| **License** | `MIT` |
|
| 90 |
+
| **Author** | [Patrick J. Burns](https://diyclassics.github.io/) |
|
| 91 |
+
| **Contributors** | Tim Geelhaar (annotation, error analysis [v3.5.2: morphologizer, tagger, parser]); Nora Bernhardt (NER); Vincent Koch (NER) |
|
| 92 |
+
|
| 93 |
+
## What's new in 3.9.2
|
| 94 |
+
|
| 95 |
+
This is an alignment release across the LatinCy pipeline family (`la_core_web_sm`/`md`/`lg`/`trf`). Model weights are **unchanged** from v3.9.x — no retraining was performed; the changes are to pipeline configuration and packaging.
|
| 96 |
+
|
| 97 |
+
- **Enclitic splitting** is now handled by a dedicated `enclitic_splitter` component, decoupled from the tokenizer. Splitting of the enclitic *-que* (e.g. *arma**que*** → *arma* + *que*) is an explicit, configurable step in the pipeline rather than tokenizer behavior.
|
| 98 |
+
- **Sentence segmentation upgraded to [`la_senter`](https://huggingface.co/latincy/la_senter) v3.9.2.** The upstream LatinCy sentence-segmentation model has been updated to v3.9.2 and is passed through to this pipeline. It adds the `token_fix` component, which runs after `senter` to repair sentence boundaries the statistical model mis-splits — keeping sentences intact across parentheticals, dash asides, and closing quotes.
|
| 99 |
+
|
| 100 |
+
See [CHANGELOG.md](CHANGELOG.md) for full version history.
|
| 101 |
+
|
| 102 |
|
| 103 |
### Label Scheme
|
| 104 |
|
|
|
|
| 137 |
| `TAGGER_LOSS` | 60260.49 |
|
| 138 |
| `MORPHOLOGIZER_LOSS` | 447952.32 |
|
| 139 |
| `TRAINABLE_LEMMATIZER_LOSS` | 383152.85 |
|
| 140 |
+
| `PARSER_LOSS` | 3276429.69 |
|
la_core_web_trf-3.9.1-py3-none-any.whl → la_core_web_trf-3.9.2-py3-none-any.whl
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1f0baf9c9b733cc7b0c2d160e747164863fabe74187ef038bf1e325cbd7c5512
|
| 3 |
+
size 1688570986
|