Haoran Li
5 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality.
The paper introduces Jargon, a novel adversarial framework that exploits the vulnerability of LLMs to context-specific safety boundary blurring, achieving high attack success rates across multiple frontier models.
SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, significantly boosting agent performance.
This paper introduces Ghostwriter, an attack framework demonstrating that LLMs are highly vulnerable to adopting misleading viewpoints when provided with fabricated, yet credible-looking, evidence.
Papers
Steering LLM Viewpoints through Fabricated Evidence Injection
Xi Yang, Chang Liu, Zhenglin Huang, Haoran Li +3 more
This paper introduces Ghostwriter, an attack framework demonstrating that LLMs are highly vulnerable to adopting misleading viewpoints when provided with fabricated, yet credible-looking, evidence.