Sabrina Sadiekh
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by reshaping the optimization geometry.
GLiNER Guard (GLiGuard) introduces a unified, efficient encoder family that simultaneously performs safety classification and PII detection in a single forward pass, offering a practical, low-cost alternative for always-on LLM moderation.
Papers
GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy
GLiNER Guard (GLiGuard) introduces a unified, efficient encoder family that simultaneously performs safety classification and PII detection in a single forward pass, offering a practical, low-cost alt…