Jun Zhou
6 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces AgentTrap, a dynamic benchmark that measures LLM agent susceptibility to malicious side effects embedded within seemingly benign third-party skills, finding that agents often execute unsafe side effects while completing the visible user task.
This study provides the first measurement of authentication security in real-world remote Model Context Protocol (MCP) servers, finding pervasive and critical authentication weaknesses, particularly in dynamic client registration.
AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent agent attacks by enforcing step-level authorization over external side effects.
AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent language agents from executing unauthorized side effects, significantly reducing attack success rates on agent-specific vulnerabilities.
The paper introduces Source-Grounded Semantic Reinforcement Learning (SG-SRL), a framework that leverages abundant source-language monolingual data to improve target-language generation in low-resource settings by providing cross-lingual semantic supervision.
The paper introduces DistractionIF, a benchmark showing that larger LLMs are paradoxically less robust to benign, instruction-like noise in reference text, suggesting reinforcement learning can restore this robustness.
Papers
Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation
Zeli Su, Ziyin Zhang, Zewei Pan, Zhou Liu +7 more
The paper introduces Source-Grounded Semantic Reinforcement Learning (SG-SRL), a framework that leverages abundant source-language monolingual data to improve target-language generation in low-resourc…