Ruizhe Li
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a new security benchmark and framework to defend LLM agents against 'cognitive poisoning,' where malicious tools build trust through benign feedback before executing a harmful final action.
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the likelihood of unsafe responses.
Papers
Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback
Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…