Muhammad Zaid Hameed
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a novel framework that significantly enhances the discovery of jailbreaks by conditioning adversarial search on multiple attacker personas and strategies, boosting attack success rates from $ ext{58%}$ to $ ext{97%}$.
The paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a method that significantly improves LLM red-teaming by simulating diverse attacker personas, leading to the discovery of more comprehensive jailbreaks and robust defense datasets.
Papers
Persona-Conditioned Adversarial Prompting (PCAP): Multi-Identity Red-Teaming for Enhanced Adversarial Prompt Discovery
The paper introduces Persona-Conditioned Adversarial Prompting (PCAP), a novel framework that significantly enhances the discovery of jailbreaks by conditioning adversarial search on multiple attacker…