Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Chen Wu

Chen Wu

4 indexed papers

Recent (6 mo)
4
With code
0
Influential cites
0
Benchmarked
0

Publications per year

4
26

Top categories

AI×3NLP×1

Frequent co-authors

Yangbo Wei1×
Zhen Huang1×
Shaoqiang Lu1×
Junhong Qian1×
Qifan Wang1×
Lei He1×

Research Timeline

2026
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of Large Language Models (LLMs).

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural scaffolding and error-survival confounds.

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on demanding tasks like code debugging.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 31, 2026

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian +3 more

SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.

View →
cs.CLRecentMay 31, 2026

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Mengmeng Ji, Ravi Shanker Raju, Jonathan Lingjie Li, Chen Wu

LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on…

View →
cs.AIRecentMay 28, 2026

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

Yuzhang Xie, Keqi Han, Yunpeng Xiao, Hejie Cui +6 more

The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of L…

View →
cs.AIRecentMay 28, 2026

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…

View →