Ahson Saiyed

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1AI×1NLP×1Crypto×1

Frequent co-authors

Sabrina Sadiekh1×

Chirag Agarwal1×

Research Timeline

2026

Towards Understanding the Robustness of Sparse Autoencoders

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by reshaping the optimization geometry.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CLRecentApr 20, 2026

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

View →