Yuyin Zhou
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper conducts the first real-world safety evaluation of the personal AI agent OpenClaw, demonstrating that its broad system access creates inherent vulnerabilities that significantly increase the attack success rate regardless of the underlying large language model.
The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding that updating capabilities are surprisingly uniform while benefit is maximized in mid-tier models.
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validation and submission.
Papers
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…