Huan Sun
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereby addressing limitations in current continual learning benchmarks.
The paper introduces SkillHarm, a comprehensive benchmark and automated framework for evaluating skill-based attacks across the entire agent skill-use lifecycle, demonstrating that current agents remain highly vulnerable to both fixed-payload and self-mutating poisoning attacks.
Papers
AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…