morphism: information retrieval with neural signals

What is this?

morphism is a model pipeline that turns your neural data (specifically, EEG) into semantic embeddings. These can be used to run a retrieval process, to select documents that the model thinks are relevant to your neural profile. This is designed as a sort of 'computational telepathy' to allow the user to transfer cognitive content to a machine without needing to use language.

Does it work?

Yes. The model reliably produces coarse-grained semantic representations from EEG that are significantly above chance across multiple evaluation settings and subjects. It generalizes across tasks and interfaces it was never trained on. The representations are approximate β€” this is not word-level mind-reading β€” but they carry real semantic content, and the effect is specific to the wearer's cognition rather than conversational context.

Results

All evaluations below were run with no feedback loop, meaning the decoder was not visible during the sessions, and the evaluation data never appeared in train, test, or validation sets. The model was trained exclusively on local LLM chats against an isolated database; the Signal and Claude evaluations represent entirely separate datasets collected through different interfaces. The model's ability to predict Claude chats more effectively is likely due to the task being more similar to the underlying training data.

Evaluation n Z-score
Signal chats (wearer's own messages) 442 5.34Οƒ
Signal chats (other person's messages) 525 0.21Οƒ
Claude chats 246 10.78Οƒ

The Signal dissociation is the key result: the model decodes the wearer's own messages well above chance, but falls to chance on the other person's messages in the same conversation. This indicates the model is reading the wearer's cognition, not picking up conversational structure.

The model's predicted embeddings capture 62 dimensions of meaningful variance (at 95%) out of the 1024-dimensional target space. The autoencoder bottleneck is 64 dimensions, so the semantic mapper is passing through nearly all of the information the autoencoder preserves.

How can I use it?

You will need a 16-channel OpenBCI Cyton device. The model may be compatible with other EEG devices; the autoencoder which forms the front-end of the model performs some work which should enhance cross-device capability, but the pipeline remains untested on non-OpenBCI devices.

Dependencies: pip install torch numpy faiss-cpu hnswlib transformers tqdm pyserial paramiko

Download the model weights, and this repository. Then create a retrieval index:

python3 morphism.py index create --corpus ~/documents/notes

Once you have your index created, you can start a recording session:

python3 morphism.py record --output session.txt

Then, run the decoding pipeline to produce a text output stream:

python3 morphism.py decode -f session.eeg \
  --autoencoder weights/autoencoder.pt \
  --semantic weights/semantic.pt

All subcommands support --help for full option details.

You will see a real-time retrieval of documents from your index that the model thinks are relevant to you. Due to the fact that this model is a research preview, the model's output may be somewhat noisy. Relevance tends to vary between individuals with the model performing more strongly on people who are more similar to the patterns represented in the training data. The retrieved text is associative, not transcriptive β€” the model tracks coarse cognitive state rather than sentence-level content, surfacing thematically related passages from your corpus.

How was this created?

This model works in two stages. The first stage is an autoencoder which represents the neural data in a latent space. The second stage is a semantic mapper, which guesses a semantic vector from the neural vector. This relatively simple architecture is surprisingly effective and lays the groundwork for future developments of this technology.

The underlying dataset represents a large collection of paired neural measurements and text stimuli, collected by Eve Labs over a period of 20 months on approximately forty subjects. Training data was gathered naturalistically; subjects chatted with LLMs while wearing the headset, with the conversation text serving as paired stimuli.

Why give this away for free?

We believe that people should be able to understand and use the signals their own brains produce. Noninvasive neural interfaces are massively underexplored, and even consumer-grade EEG hardware carries far more information than is commonly assumed. We hope this model can serve as a demonstration of that idea.

On a practical level, new models such as this can be evaluated and understood much faster if the weights are released. Given the complexity of the task at hand, we hope that people can report their experiences and help advance the field.

Project structure

morphism
β”œβ”€β”€ morphism.py          # CLI entrypoint
β”œβ”€β”€ cyton.py             # OpenBCI Cyton+Daisy recording
β”œβ”€β”€ decode.py            # EEG β†’ semantic embedding β†’ FAISS search
β”œβ”€β”€ eegembed.py          # EEG autoencoder streaming
β”œβ”€β”€ embed.py             # Text embedding + SQLite storage
β”œβ”€β”€ retrieval.py         # Retrieval pipeline
β”œβ”€β”€ encoder_traced.pt    # Traced autoencoder weights
β”œβ”€β”€ semantic_traced.pt   # Traced semantic model weights
└── README.md

How can I get involved?

Join the Discord, follow @evelabsai on Twitter, or send over an email to hello@evelabs.info.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support