---
tags:
- chess
- transformer
- gpt
- mechanistic-interpretability
license: mit
---

# chessgpt-medium

ChessGPT model trained for mechanistic interpretability research.

## Model Details

- **Model Size**: 0.0M parameters (approximate)
- **Architecture**: GPT-style transformer
- **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
- **Context Length**: 256
- **Layers**: 12
- **Hidden Size**: 768
- **Attention Heads**: 12

## Training Configuration

- **Dataset**: Lichess/standard-chess-games
- **Min Elo**: 1800
- **Min Moves**: 10
- **Batch Size**: 32
- **Learning Rate**: 3e-4
- **Epochs**: 10

## Metrics

- **loss**: 1.1781
- **accuracy**: 0.7051
- **perplexity**: 3.2484


## Usage

```python
from src.model import ChessGPT, load_config_from_yaml
from src.training.hf_utils import load_model_from_hub

# Load model from Hub
model = load_model_from_hub("taj-gillin/chessgpt")

# Or load from config
config = load_config_from_yaml("configs/model/medium.yaml")
model = ChessGPT(config)
# ... load weights ...
```

## Research

This model is part of mechanistic interpretability research on chess-playing transformers.
The goal is to understand what internal representations and algorithms the model learns.

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{chessgpt2024,
  title={ChessGPT: Mechanistic Interpretability of Chess Transformers},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/taj-gillin/chessgpt}
}
```