An experiment, see details: https://github.com/fakerybakery/ReverseBERT. Inspired by https://github.com/vec2text/vec2text

Overview

Can you go from embeddings back to text?

The setup is pretty simple: take a sentence encoder and freeze it. Then train a small projection layer that maps those embeddings into "soft prompt" tokens for a language model. The LLM learns to reconstruct the original text from just those projected embeddings.

It's far from perfect. You probably can't reconstruct the exact meaning of the text, but you can get the general idea/vibe of the original input.

Usage

See: https://github.com/fakerybakery/ReverseBERT/blob/main/infer.py

Reconstruction samples

Coming soon

Credits

As always, huge thanks to Hugging Face 🤗 for supporting the compute used to train this model!

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrfakename/ReverseBERT-EmbeddingGemma-300M

Adapter
(54)
this model

Dataset used to train mrfakename/ReverseBERT-EmbeddingGemma-300M