Ismail Hossain
6 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Semantic Intent Fragmentation (SIF), an attack class demonstrating that multi-agent AI orchestrators can violate security policies through a composition of individually benign subtasks, even when subtask-level safety checks pass.
The paper demonstrates that fine-tuning safety guard models on benign data can catastrophically collapse their safety alignment, proposing Fisher-Weighted Safety Subspace Regularization (FW-SSR) to actively maintain safety geometry.
This paper addresses the lack of systematic infrastructure for evaluating jailbreak attacks by introducing a large-scale dataset, an automated generation method, and a continuous evaluation metric that surpasses traditional binary scoring.
The paper identifies the Misattribution Gap, showing that memory-layer attacks (Semantic Norm Drift) can mimic model failure in multi-agent AI systems, and proposes novel detection and mitigation techniques.
The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior hidden within open agentic skills, significantly outperforming static and semantic-only detection methods.
The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior in open agentic skill ecosystems, significantly outperforming existing static and semantic-only detection methods.
Papers
Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems
Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more
The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior hidden within open agentic skills, significantly outperforming static and seman…