E5-base Latin Intertextuality Embedding Model

This model is a fine-tuned version of intfloat/multilingual-e5-base for generating embeddings of Latin texts to detect intertextual relationships.

The model is part of the Loci Similes benchmark setup (Schelb et al., 2026), evaluated on expert-verified Latin intertextual links. It is designed to work with the LociSimiles Python package API: https://julianschelb.github.io/locisimiles/api/.

Model Description

  • Task: Sentence embedding for detecting intertextual links between classical Latin authors
  • Model type: Sentence Transformer (Embedding Model)
  • Base model: intfloat/multilingual-e5-base
  • Negative sampling ratio: 1_pos_to_10_neg
  • Fold uploaded: 0
  • Language: Latin
  • License: Apache 2.0

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("julian-schelb/multilingual-e5-base-emb-lat-intertext-v1")
query_embedding = model.encode("Query: arma virumque cano")
candidate_embedding = model.encode("Candidate: arma virumque cano troiae qui primus ab oris")

If prompts are configured in the model, prefer:

  • prompt_name="query" for query texts
  • prompt_name="match" for candidate texts

Citation

@misc{schelb2026locisimilesbenchmarkextracting,
      title={Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature},
      author={Julian Schelb and Michael Wittweiler and Marie Revellio and Barbara Feichtinger and Andreas Spitz},
      year={2026},
      eprint={2601.07533},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2601.07533},
}
Downloads last month
67
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for julian-schelb/multilingual-e5-base-emb-lat-intertext-v1

Finetuned
(115)
this model

Collection including julian-schelb/multilingual-e5-base-emb-lat-intertext-v1

Paper for julian-schelb/multilingual-e5-base-emb-lat-intertext-v1