Akiko Aizawa
4 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper introduces the Data-Model Compatibility (DMC) metric to quantify how suitable a dataset is for reasoning distillation, showing that optimizing data selection using DMC significantly improves the performance of smaller student models.
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate failure diagnosis and generalizing learned lessons.
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from the language processing side rather than poor visual attention.
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer, unlike when the evidence is presented in a table.
Papers
Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…