E5-base Latin Intertextuality Embedding Model

This model is a fine-tuned version of intfloat/multilingual-e5-base for generating embeddings of Latin texts to detect intertextual relationships.

The model is part of the Loci Similes benchmark setup (Schelb et al., 2026), evaluated on expert-verified Latin intertextual links. It is designed to work with the LociSimiles Python package API: https://julianschelb.github.io/locisimiles/api/.

Model Description

Task: Sentence embedding for detecting intertextual links between classical Latin authors
Model type: Sentence Transformer (Embedding Model)
Base model: intfloat/multilingual-e5-base
Negative sampling ratio: 1_pos_to_10_neg
Fold uploaded: 0
Language: Latin
License: Apache 2.0

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("julian-schelb/multilingual-e5-base-emb-lat-intertext-v1")
query_embedding = model.encode("Query: arma virumque cano")
candidate_embedding = model.encode("Candidate: arma virumque cano troiae qui primus ab oris")

If prompts are configured in the model, prefer:

prompt_name="query" for query texts
prompt_name="match" for candidate texts

Citation

@misc{schelb2026locisimilesbenchmarkextracting,
      title={Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature},
      author={Julian Schelb and Michael Wittweiler and Marie Revellio and Barbara Feichtinger and Andreas Spitz},
      year={2026},
      eprint={2601.07533},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2601.07533},
}