Saraga Sakthidharan
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper provides the first comprehensive Systematization of Knowledge (SoK) on the security aspects of LLM-as-a-Judge (LaaJ) systems, identifying key vulnerabilities and proposing a taxonomy for future research.
The paper introduces NeWTral, a framework that restores safety alignment to specialized LLM adapters without sacrificing their domain-specific knowledge, achieving a significant reduction in attack success rates while maintaining high fidelity.
Papers
You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation
The paper introduces NeWTral, a framework that restores safety alignment to specialized LLM adapters without sacrificing their domain-specific knowledge, achieving a significant reduction in attack su…