Sachin Kumar

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×2AI×2ML×2Crypto×1

Research Timeline

2026

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

This paper systematically diagnoses the failure modes of linear deception probes in LLMs, finding that while single-direction probes are insufficient, multi-dimensional probes can recover robust detection by leveraging distributed, sub-threshold features, and that probe fragility is an artifact of training distribution rather than model scale.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIcs.LGRecentMay 27, 2026

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Sachin Kumar

View →

cs.CLcs.AIcs.CRRecentMay 8, 2026