Typhoon-S-4B NitiBench-CCL Legal Agent (Research Preview)
Typhoon-S-4B NitiBench-CCL Legal Agent is a "Sovereign," domain-specific research artifact designed to demonstrate that domain-specific sovereignty—through InK-GRPO–based agentic RFT— can outperform brute-force scale through a carefully designed post-training strategy.
For more information, please read the full technical report on arXiv.
This checkpoint is not a general-purpose instruction model and not intended for production or real-world legal use.
- Not a product model
- Not a general Thai legal assistant
- Not safe or reliable for legal advice
- Not expected to be useful outside the intended evaluation setup If you are looking for a generally capable assistant or a model for real-world legal workflows, do not use this.
Training Overview
Typhoon-S-4B NitiBench-CCL Legal Agent is post-trained using GRPO-based RFT with two key extensions.
Agentic RFT
- The model is trained as a multi-step agent operating in a controlled RAG environment.
- Available tools:
search: semantic retrieval over a Thai legal corpusread: document-level content access
- GRPO is applied over entire interaction trajectories, not single turns.
- Rewards focus on final-answer correctness.
InK-GRPO (Injected Knowledge GRPO)
GRPO is augmented with a stochastic auxiliary next-token prediction objective on in-domain Thai legal text:
This allows domain knowledge injection during reinforcement fine-tuning without collapsing into pure supervised learning.
Training Data (High-Level)
Training is centered on NitiBench (CCL) and aligned Thai legal corpora:
- RFT (GRPO):
- Question–answer tasks derived from
airesearch/WangchanX-Legal-ThaiCCL-RAG(CCL split) - Rewarded based on correctness against reference answers (LLM-as-a-judge in the paper setup)
- Question–answer tasks derived from
- CE data (InK-GRPO):
- In-domain Thai legal text for auxiliary next-token prediction
Exact datasets, filtering, and preprocessing are described in the Typhoon-S technical report and NitiBench documentation.
Evaluation
Agent-Based Evaluation Required
This model is only meaningful when evaluated using the official agentic setup:
https://github.com/scb-10x/typhoon-s/tree/master/evaluation
Evaluating this checkpoint outside the specified agent + RAG environment will produce non-comparable and misleading results.
NitiBench (Thai Legal Reasoning, Agentic)
| Model | NitiBench Accuracy |
|---|---|
| Qwen3-4B-Instruct-2507 + Agent | 46.11% |
| GPT-5 + Built-in Search | 38.07% |
| GPT-5 + Agent | 75.34% |
| GRPO + Agent | 73.73% |
| Typhoon-S-4B NitiBench-CCL InK-GRPO + Agent | 78.02% |
Results are benchmark- and environment-specific and should not be interpreted as general legal competence.
How to Use (Research Only)
This checkpoint is intended only for:
- Studying Agentic RFT and InK-GRPO behavior
Recommended Usage
Run the official agentic evaluation pipeline:
https://github.com/scb-10x/typhoon-s/tree/master/evaluation
Expected conditions:
- Agent-style prompting
search/readtools enabled- Thai legal corpus aligned with NitiBench
- Evaluation protocol unchanged
Intended Uses & Limitations
Intended Use
- Research-only experimentation
- Benchmark comparison under NitiBench agentic evaluation
Limitations & Risks
- Not a deployable model
- Not legal advice
- Benchmark-specific optimization
- Environment-dependent performance
- No safety, bias, or robustness guarantees
- May hallucinate statutes, cases, or interpretations
Citation
If you use this model or its methods, please cite the Typhoon-S technical report.
If you use the dataset, please cite NitiBench directly.
@misc{pipatanakul2026typhoonsminimalopenposttraining,
title={Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models},
author={Kunat Pipatanakul and Pittawat Taveekitworachai},
year={2026},
eprint={2601.18129},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.18129},
}
- Downloads last month
- 195