--- tags: - chess - transformer - gpt - mechanistic-interpretability license: mit --- # chessgpt-medium ChessGPT model trained for mechanistic interpretability research. ## Model Details - **Model Size**: 0.0M parameters (approximate) - **Architecture**: GPT-style transformer - **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens) - **Context Length**: 256 - **Layers**: 12 - **Hidden Size**: 768 - **Attention Heads**: 12 ## Training Configuration - **Dataset**: Lichess/standard-chess-games - **Min Elo**: 1800 - **Min Moves**: 10 - **Batch Size**: 32 - **Learning Rate**: 3e-4 - **Epochs**: 10 ## Metrics - **loss**: 1.1781 - **accuracy**: 0.7051 - **perplexity**: 3.2484 ## Usage ```python from src.model import ChessGPT, load_config_from_yaml from src.training.hf_utils import load_model_from_hub # Load model from Hub model = load_model_from_hub("taj-gillin/chessgpt") # Or load from config config = load_config_from_yaml("configs/model/medium.yaml") model = ChessGPT(config) # ... load weights ... ``` ## Research This model is part of mechanistic interpretability research on chess-playing transformers. The goal is to understand what internal representations and algorithms the model learns. ## Citation If you use this model in your research, please cite: ```bibtex @misc{chessgpt2024, title={ChessGPT: Mechanistic Interpretability of Chess Transformers}, author={Your Name}, year={2024}, url={https://huggingface.co/taj-gillin/chessgpt} } ```