Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Chirag Agarwal

Chirag Agarwal

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

NLP×2AI×2ML×1Crypto×1

Frequent co-authors

Eric Onyame1×
Runtao Zhou1×
Kowshik Thopalli1×
Bhavya Kailkhura1×
Ahson Saiyed1×
Sabrina Sadiekh1×

Research Timeline

2026
Towards Understanding the Robustness of Sparse Autoencoders

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by reshaping the optimization geometry.

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resource settings.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentMay 27, 2026

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more

This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…

View →
cs.LGcs.AIcs.CLRecentApr 20, 2026

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by…

View →