Hoagy Cunningham

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×1NLP×1Crypto×1

Frequent co-authors

Tom Conerly1×

Brian Chen1×

Research Timeline

2026

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

The paper introduces a robust streaming probing objective that requires multiple evidence tokens to support a prediction, significantly improving the detection of harmful intent in LLMs, especially in sensitive CBRN domains.

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

The paper demonstrates that sparse autoencoders can successfully extract a large set of interpretable, causally influential features from the production-scale Claude 3 Sonnet language model.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 28, 2026

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey +22 more

The paper demonstrates that sparse autoencoders can successfully extract a large set of interpretable, causally influential features from the production-scale Claude 3 Sonnet language model.

View →

cs.CLcs.CRRecentApr 16, 2026