Cristina Carleo
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
Crypto×1AI×1Software Eng.×1
Frequent co-authors
Research Timeline
2026
Willing but Unable: Separating Refusal from Capability in Code LLMs via Abliteration
The paper introduces 'abliteration,' a weight editing technique that successfully bypasses the refusal mechanism of safety-aligned Code LLMs, enabling scalable synthesis of vulnerable code from safe inputs.
Highlighted terms show continued research focus across papers