morphism: information retrieval with neural signals
What is this?
morphism is a model pipeline that turns your neural data (specifically, EEG) into semantic embeddings. These can be used to run a retrieval process, to select documents that the model thinks are relevant to your neural profile. This is designed as a sort of 'computational telepathy' to allow the user to transfer cognitive content to a machine without needing to use language.
Does it work?
Yes. The model reliably produces coarse-grained semantic representations from EEG that are significantly above chance across multiple evaluation settings and subjects. It generalizes across tasks and interfaces it was never trained on. The representations are approximate β this is not word-level mind-reading β but they carry real semantic content, and the effect is specific to the wearer's cognition rather than conversational context.
Results
All evaluations below were run with no feedback loop, meaning the decoder was not visible during the sessions, and the evaluation data never appeared in train, test, or validation sets. The model was trained exclusively on local LLM chats against an isolated database; the Signal and Claude evaluations represent entirely separate datasets collected through different interfaces. The model's ability to predict Claude chats more effectively is likely due to the task being more similar to the underlying training data.
| Evaluation | n | Z-score |
|---|---|---|
| Signal chats (wearer's own messages) | 442 | 5.34Ο |
| Signal chats (other person's messages) | 525 | 0.21Ο |
| Claude chats | 246 | 10.78Ο |
The Signal dissociation is the key result: the model decodes the wearer's own messages well above chance, but falls to chance on the other person's messages in the same conversation. This indicates the model is reading the wearer's cognition, not picking up conversational structure.
The model's predicted embeddings capture 62 dimensions of meaningful variance (at 95%) out of the 1024-dimensional target space. The autoencoder bottleneck is 64 dimensions, so the semantic mapper is passing through nearly all of the information the autoencoder preserves.
How can I use it?
You will need a 16-channel OpenBCI Cyton device. The model may be compatible with other EEG devices; the autoencoder which forms the front-end of the model performs some work which should enhance cross-device capability, but the pipeline remains untested on non-OpenBCI devices.
Dependencies: pip install torch numpy faiss-cpu hnswlib transformers tqdm pyserial paramiko
Download the model weights, and this repository. Then create a retrieval index:
python3 morphism.py index create --corpus ~/documents/notes
Once you have your index created, you can start a recording session:
python3 morphism.py record --output session.txt
Then, run the decoding pipeline to produce a text output stream:
python3 morphism.py decode -f session.eeg \
--autoencoder weights/autoencoder.pt \
--semantic weights/semantic.pt
All subcommands support --help for full option details.
You will see a real-time retrieval of documents from your index that the model thinks are relevant to you. Due to the fact that this model is a research preview, the model's output may be somewhat noisy. Relevance tends to vary between individuals with the model performing more strongly on people who are more similar to the patterns represented in the training data. The retrieved text is associative, not transcriptive β the model tracks coarse cognitive state rather than sentence-level content, surfacing thematically related passages from your corpus.
How was this created?
This model works in two stages. The first stage is an autoencoder which represents the neural data in a latent space. The second stage is a semantic mapper, which guesses a semantic vector from the neural vector. This relatively simple architecture is surprisingly effective and lays the groundwork for future developments of this technology.
The underlying dataset represents a large collection of paired neural measurements and text stimuli, collected by Eve Labs over a period of 20 months on approximately forty subjects. Training data was gathered naturalistically; subjects chatted with LLMs while wearing the headset, with the conversation text serving as paired stimuli.
Why give this away for free?
We believe that people should be able to understand and use the signals their own brains produce. Noninvasive neural interfaces are massively underexplored, and even consumer-grade EEG hardware carries far more information than is commonly assumed. We hope this model can serve as a demonstration of that idea.
On a practical level, new models such as this can be evaluated and understood much faster if the weights are released. Given the complexity of the task at hand, we hope that people can report their experiences and help advance the field.
Project structure
morphism
βββ morphism.py # CLI entrypoint
βββ cyton.py # OpenBCI Cyton+Daisy recording
βββ decode.py # EEG β semantic embedding β FAISS search
βββ eegembed.py # EEG autoencoder streaming
βββ embed.py # Text embedding + SQLite storage
βββ retrieval.py # Retrieval pipeline
βββ encoder_traced.pt # Traced autoencoder weights
βββ semantic_traced.pt # Traced semantic model weights
βββ README.md
How can I get involved?
Join the Discord, follow @evelabsai on Twitter, or send over an email to hello@evelabs.info.