Jun Zhu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces TRAP, an adversarial attack that demonstrates how physical patches can hijack the Chain-of-Thought (CoT) reasoning process in Vision-Language-Action (VLA) models, forcing them to perform unintended actions.
The paper introduces Dummy-Aware Weighted Attack (DAWA), a novel evaluation method that significantly reduces the reported robustness of Dummy Classes-based defenses by simultaneously targeting both the true and dummy class labels.
The paper proposes SafeReview, a co-evolutionary adversarial training framework that significantly improves the robustness of LLM-based peer review systems against sophisticated adversarial hidden prompts.
Papers
SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts
Yuan Xin, Yixuan Weng, Minjun Zhu, Ying Ling +4 more
The paper proposes SafeReview, a co-evolutionary adversarial training framework that significantly improves the robustness of LLM-based peer review systems against sophisticated adversarial hidden pro…