Chen Wu

4 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×3NLP×1

Frequent co-authors

Yangbo Wei1×

Zhen Huang1×

Shaoqiang Lu1×

Junhong Qian1×

Qifan Wang1×

Lei He1×

Research Timeline

2026

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of Large Language Models (LLMs).

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural scaffolding and error-survival confounds.

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on demanding tasks like code debugging.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 31, 2026

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian +3 more

SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.

View →

cs.CLRecentMay 31, 2026