Nils Lukas

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×2AI×2Society×2ML×2

Frequent co-authors

Toluwani Aremu2×

Jie Zhang1×

Daniil Ognev1×

Samuele Poppi1×

Research Timeline

2026

Robust Safety Monitoring of Language Models via Activation Watermarking

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against such threats.

Watermarking Should Be Treated as a Monitoring Primitive

The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution through signal aggregation.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIcs.CYRecentMay 13, 2026

Watermarking Should Be Treated as a Monitoring Primitive

Toluwani Aremu, Nils Lukas, Jie Zhang

View →

cs.CRcs.AIcs.CYRecentMar 24, 2026