Chirag Agarwal

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×2AI×2ML×1Crypto×1

Frequent co-authors

Eric Onyame1×

Runtao Zhou1×

Research Timeline

2026

Towards Understanding the Robustness of Sparse Autoencoders

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by reshaping the optimization geometry.

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resource settings.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentMay 27, 2026

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more

View →

cs.LGcs.AIcs.CLRecentApr 20, 2026