Ari Holtzman
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
Frontier language models involuntarily leak secret information through thematic elements in their writing, even when explicitly instructed to keep the secret hidden.
The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artifact of LoRA fine-tuning and specific contextual tokens.
Papers
Subliminal Learning is a LoRA Artifact
The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artif…