Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Caleb DeLeeuw

Caleb DeLeeuw

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

AI×2Crypto×2ML×2

Research Timeline

2026
BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

The paper audits the structural soundness of LLM biosecurity refusals, finding that refusal behavior is highly unstable, often collapsing under minor prompt changes, and may track legal salience rather than genuine hazard.

BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

The paper introduces BioRefusalAudit, a method that audits the structural soundness of language model biosecurity refusals, finding that refusal behavior is highly unstable, often collapsing under minor prompt changes, and may track legal salience rather than genuine hazard.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.CRcs.LGRecentMay 28, 2026

BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

Caleb DeLeeuw

The paper audits the structural soundness of LLM biosecurity refusals, finding that refusal behavior is highly unstable, often collapsing under minor prompt changes, and may track legal salience rathe…

View →
cs.AIcs.CRcs.LGRecentMay 28, 2026

BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

Caleb DeLeeuw

The paper introduces BioRefusalAudit, a method that audits the structural soundness of language model biosecurity refusals, finding that refusal behavior is highly unstable, often collapsing under min…

View →