Zongjie Li
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language models, establishing reasoning safety as a critical security dimension.
The paper independently stress-tests Claude Code's auto mode permission system using a deliberately ambiguous benchmark, finding that its true false negative rate is significantly higher than reported, particularly due to unmonitored file edits.
Papers
Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode
Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao +1 more
The paper independently stress-tests Claude Code's auto mode permission system using a deliberately ambiguous benchmark, finding that its true false negative rate is significantly higher than reported…