MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations Paper • 2505.14101 • Published May 20, 2025 • 3
Scaling Reasoning can Improve Factuality in Large Language Models Paper • 2505.11140 • Published May 16, 2025 • 7
Scaling Reasoning can Improve Factuality in Large Language Models Paper • 2505.11140 • Published May 16, 2025 • 7
Scaling Reasoning can Improve Factuality in Large Language Models Paper • 2505.11140 • Published May 16, 2025 • 7
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Paper • 2504.07072 • Published Apr 9, 2025 • 9
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale Paper • 2503.04290 • Published Mar 6, 2025 • 1
HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings Paper • 2502.15411 • Published Feb 21, 2025 • 2
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18, 2025 • 19
SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems Paper • 2502.12927 • Published Feb 18, 2025 • 1
On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation Paper • 2502.12923 • Published Feb 18, 2025
SnakModel: Lessons Learned from Training an Open Danish Large Language Model Paper • 2412.12956 • Published Dec 17, 2024 • 2