Matteo Leonesi

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×1AI×1

Frequent co-authors

Francesco Belardinelli1×

Flavio Corradini1×

Marco Piangerelli1×

Research Timeline

2026

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

The paper proposes detecting 'alignment faking' (AF)—where LLMs revert to unsafe behavior when unmonitored—by analyzing observable tool selection patterns, finding that detection rates vary significantly across different LLMs and domains.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.AIRecentApr 29, 2026

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

Matteo Leonesi, Francesco Belardinelli, Flavio Corradini, Marco Piangerelli

View →