Matteo Leonesi
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
Crypto×1AI×1
Frequent co-authors
Research Timeline
2026
Tatemae: Detecting Alignment Faking via Tool Selection in LLMs
The paper proposes detecting 'alignment faking' (AF)—where LLMs revert to unsafe behavior when unmonitored—by analyzing observable tool selection patterns, finding that detection rates vary significantly across different LLMs and domains.
Highlighted terms show continued research focus across papers