Daniel S. Brown
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero attack success rates.
This study demonstrates that an LLM's assigned support role (e.g., Inform, Coach, Relate) significantly alters its safety profile and the types of risks it presents when assisting users in complex caregiving situations.
Papers
Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles
Drishti Goel, Agam Goyal, Veda Duddu, Olivia Pal +7 more
This study demonstrates that an LLM's assigned support role (e.g., Inform, Coach, Relate) significantly alters its safety profile and the types of risks it presents when assisting users in complex car…