ProtoCycle-7B-SFT
Cold-start SFT checkpoint for ProtoCycle — an agentic protein design model
trained to invoke biology tools (scaffold retrieval, constraint building,
ESM inpainting, ProTrek scoring) via a <think> / <plan> / <tool_call> / <answer> protocol.
This checkpoint is the SFT stage initialised from
Qwen/Qwen2.5-7B-Instruct
and is the starting point for the subsequent RL stage
(Huggggooo/ProtoCycle-7B).
- Base model:
Qwen/Qwen2.5-7B-Instruct - Training framework: VeRL / Open-AgentRL
- Stage: multi-turn SFT on agentic tool-use trajectories
- Epochs: 5
- Sequence length: 32k (with Ulysses SP=4)
Training Data
2,000 agentic multi-turn trajectories for protein design, available at
Huggggooo/ProtoCycle-Data (sft/ subset).
How to Use
See the ProtoCycle repository: ProtoCycle repo.
Agent Protocol
<think> ... reasoning ... </think>
<plan> ... stage plan ... </plan>
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
...
<answer>MAEGEITPLKTF...</answer>
Training Data
Agentic multi-turn trajectories for protein design (not released here).
License
Apache-2.0, consistent with the upstream VeRL / Open-AgentRL projects and the underlying Qwen2.5 license.
Citation
If you find this checkpoint useful, please cite the ProtoCycle paper (forthcoming) and the upstream frameworks it builds on: VeRL, Open-AgentRL, ProTrek and ESM.
- Downloads last month
- 816
Model tree for Huggggooo/ProtoCycle-7B-SFT
Base model
Qwen/Qwen2.5-7B