Don't Retrieve, Generate
Collection
Datasets, models generated and fine-tuned for the paper Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval
• 6 items • Updated
How to use chungimungi/fine-tuned-msmarco with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("chungimungi/fine-tuned-msmarco")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)The models used in the paper Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval.
If any of these models were useful consider citing us :)
@misc{sinha2025dontretrievegenerateprompting,
title={Don't Retrieve, Generate: Prompting LLMs for Synthetic Training Data in Dense Retrieval},
author={Aarush Sinha},
year={2025},
eprint={2504.21015},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2504.21015},
}
Base model
distilbert/distilbert-base-uncased