Yoonpyo Lee
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Semantic Intent Fragmentation (SIF), an attack class demonstrating that multi-agent AI orchestrators can violate security policies through a composition of individually benign subtasks, even when subtask-level safety checks pass.
The paper demonstrates that fine-tuning safety guard models on benign data can catastrophically collapse their safety alignment, proposing Fisher-Weighted Safety Subspace Regularization (FW-SSR) to actively maintain safety geometry.
Papers
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines
Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala +3 more
The paper introduces Semantic Intent Fragmentation (SIF), an attack class demonstrating that multi-agent AI orchestrators can violate security policies through a composition of individually benign sub…