Hao Liao
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a static analysis pipeline using graph kernels to automatically attribute unknown Android proxy malware to specific commercial proxy networks with high accuracy.
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural scaffolding and error-survival confounds.
Papers
MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs
Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…