Xiangyi Li's picture

Xiangyi Li PRO

xdotli

·

https://www.xiangyi.li

AI & ML interests

None yet

Recent Activity

new activity 6 days ago

harborframework/parity-experiments:Add SkillsBench parity experiment (gemini-cli, 3x70 tasks with skills)

new activity 6 days ago

harborframework/parity-experiments:Add SkillsBench parity experiment (full data at xdotli/skillsbench-parity)

new activity 6 days ago

harborframework/parity-experiments:SkillsBench parity data (4/174)

View all activity

Organizations

upvoted 2 papers 17 days ago

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Paper • 2603.09827 • Published 19 days ago • 29

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published 20 days ago • 81

upvoted 2 papers about 1 month ago

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 34

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2, 2025 • 57

upvoted a collection about 1 month ago

SkillsBench

1 item • Updated Feb 17 • 1

upvoted a paper about 1 month ago

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published Feb 13 • 56

upvoted 2 papers about 1 year ago

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Paper • 2503.02003 • Published Mar 3, 2025 • 48