Marina Krek
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces the first backdoor attack specifically targeting pipeline parallelism in decentralized post-training, demonstrating that a limited adversary controlling an intermediate stage can significantly degrade model alignment.
MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control expert selection.
Papers
MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks
Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin +1 more
MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control ex…