Mayank Singh
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
NLP×1AI×1ML×1
Frequent co-authors
Research Timeline
2026
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
The paper introduces retraining-free frameworks (Meow2X and TRNE) that mechanistically localize and suppress toxicity within language models by analyzing activation differences, achieving safety improvements without costly retraining.
Highlighted terms show continued research focus across papers