Charles Fleming
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step interactions.
The paper introduces 'covert control attacks,' a novel and stealthy data poisoning method that teaches LLMs an information hiding scheme, allowing malicious instructions to be encoded and decoded and bypassing existing defenses.
Papers
Cordyceps: Covert Control Attacks on LLMs via Data Poisoning
The paper introduces 'covert control attacks,' a novel and stealthy data poisoning method that teaches LLMs an information hiding scheme, allowing malicious instructions to be encoded and decoded and…