~ similar to 2605.31556· 19 results
The paper proposes a neuron-level intervention method to identify and control gender-specific representations (feminine, masculine, and gender-neutral) within large language models, demonstrating prec…
The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
The paper demonstrates that explicit gender cues systematically affect LLM value trade-offs, causing decision flips that are often masked or misattributed by the models themselves.
Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more
The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…
The paper identifies specific attention heads in LLMs responsible for 'cultural binding'—associating cultural items with appropriate identities—and demonstrates that this capability is pre-trained and…
The paper introduces Brain-IT-VQA, a novel framework that significantly improves visual question answering from fMRI signals, and presents NSD-VQA, a new, highly controlled dataset for this task.
Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang +2 more
The paper proposes AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly models the inherent asymmetry between visual and linguistic modalities, achieving si…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
The paper investigates anthropomorphic reflection markers (like 'hmm' or 'wait') in LLM reasoning and finds that these markers are often surface cues, not necessary for strong reasoning performance.
Yousef A. Radwan, Xuhui Liu, Kilichbek Haydarov, Yuqian Fu +1 more
The paper demonstrates that the valence structure learned by modern LLMs aligns with human EEG emotional representations, but finds that further supervised alignment is ineffective due to a phenomenon…
This paper evaluates biases in multimodal speech recognition by testing how pairing different faces with the same audio affects transcription accuracy, finding significant quality-of-service drops acr…
Kyle Moore, Jesse Roberts, Daryl Watson, William Ward +1 more
This paper investigates whether large language models exhibit uncertainty signals similar to human judgment, examining both overt behavior and internal activation patterns to assess alignment and cali…
The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The study finds that for a relational intervention to successfully restore a language model's behavior after functional collapse, both a relational structure (e.g., acknowledgment) and a first-person…
Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more
This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…