Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Daniil Ognev

Daniil Ognev

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

Crypto×1AI×1Society×1ML×1

Frequent co-authors

Toluwani Aremu1×
Samuele Poppi1×
Nils Lukas1×

Research Timeline

2026
Robust Safety Monitoring of Language Models via Activation Watermarking

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against such threats.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →