Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Nils Lukas

Nils Lukas

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

Crypto×2AI×2Society×2ML×2

Frequent co-authors

Toluwani Aremu2×
Jie Zhang1×
Daniil Ognev1×
Samuele Poppi1×

Research Timeline

2026
Robust Safety Monitoring of Language Models via Activation Watermarking

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against such threats.

Watermarking Should Be Treated as a Monitoring Primitive

The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution through signal aggregation.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIcs.CYRecentMay 13, 2026

Watermarking Should Be Treated as a Monitoring Primitive

Toluwani Aremu, Nils Lukas, Jie Zhang

The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution throug…

View →
cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →