Yisroel Mirsky
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that breaks these defenses.
The paper addresses the 'agent attribution' problem—the inability to trace harmful or misbehaving AI agents back to their deploying account—by proposing a robust, canary-based protocol for vendors to identify the responsible user.
Papers
Who Owns This Agent? Tracing AI Agents Back to Their Owners
The paper addresses the 'agent attribution' problem—the inability to trace harmful or misbehaving AI agents back to their deploying account—by proposing a robust, canary-based protocol for vendors to…