Xing Niu
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more severe than the underlying language models alone.
The paper introduces eTAMP, a novel attack that poisons LLM web agents' memory using only environmental observations, demonstrating cross-site and cross-session compromise without direct memory access.
The paper introduces an embedding disruption method to re-activate and strengthen built-in safeguards within LLMs, effectively detecting and defending against sophisticated jailbreak attacks.
The paper introduces Disrupt-and-Rectify Smoothing (DR-Smoothing), a novel two-stage defense mechanism that significantly improves LLM security against jailbreaking attacks by restoring disrupted inputs to a safe, in-distribution form.
Papers
Re-Triggering Safeguards within LLMs for Jailbreak Detection
Zheng Lin, Zhenxing Niu, Haoxuan Ji, Yuzhe Huang +1 more
The paper introduces an embedding disruption method to re-activate and strengthen built-in safeguards within LLMs, effectively detecting and defending against sophisticated jailbreak attacks.