Xuan Luo
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe actions across the entire system.
The paper introduces BAIT, a three-step jailbreak framework that systematically forces large language models to disclose harmful information by leveraging their internal reasoning and consistency tendencies.
Papers
BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning
The paper introduces BAIT, a three-step jailbreak framework that systematically forces large language models to disclose harmful information by leveraging their internal reasoning and consistency tend…