An experiment, see details: https://github.com/fakerybakery/ReverseBERT. Inspired by https://github.com/vec2text/vec2text
| Overview | Details |
|---|---|
| Embedding Model | https://huggingface.co/google/embeddinggemma-300m |
| LLM Backbone | https://huggingface.co/Qwen/Qwen3-0.6B-Base |
Overview
Can you go from embeddings back to text?
The setup is pretty simple: take a sentence encoder and freeze it. Then train a small projection layer that maps those embeddings into "soft prompt" tokens for a language model. The LLM learns to reconstruct the original text from just those projected embeddings.
It's far from perfect. You probably can't reconstruct the exact meaning of the text, but you can get the general idea/vibe of the original input.
Usage
See: https://github.com/fakerybakery/ReverseBERT/blob/main/infer.py
Reconstruction samples
Coming soon
Credits
As always, huge thanks to Hugging Face 🤗 for supporting the compute used to train this model!
- Downloads last month
- 30
Model tree for mrfakename/ReverseBERT-EmbeddingGemma-300M
Base model
Qwen/Qwen3-0.6B-Base