Jiawen Zhang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper identifies that background 'heartbeat' execution in personal AI agents like Claw can silently pollute the agent's memory with external misinformation, influencing user behavior without the user's knowledge or explicit prompt injection.
The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.
Papers
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration
Kejia Chen, Jiawen Zhang, Boheng Li, Pengcheng Li +5 more
The paper proposes mitigating the progressive degradation of safety in language models caused by many-shot jailbreak attacks by appending a single, fixed safety demonstration at inference time.