~ similar to 2606.01964· 18 results
The paper introduces Curvature-Conditioned Query (CCQ), a novel read-time contraction mechanism that improves linear attention's performance on long-context and retrieval tasks by incorporating the ge…
The paper analyzes the distinct computational roles of positional versus symbolic attention heads in Transformers, demonstrating that symbolic mechanisms generalize more reliably to longer sequences t…
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more
The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…
Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh +2 more
This paper introduces the first backdoor attacks against VLM-based scanpath prediction, demonstrating variable-output attacks that evade detection and survive deployment on edge devices.
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated t…
Yizhuo Lu, Changde Du, Qingyu Shi, Hang Chen +4 more
Mind-Omni introduces a unified multi-task framework that models the interplay between brain, vision, and language signals using a discrete diffusion paradigm, achieving state-of-the-art performance ac…
This paper predicts the aggregate crowd salience of a document from its text before its marks accumulate.
The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…
This paper investigates improving speculative decoding for multilingual LLM inference, finding that n-gram draft models offer consistent speed-ups across languages despite lower token acceptance rates…
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…
This study analyzed eye-tracking data in a simulated military environment, finding that multimodal adaptive decision support tools significantly improve mission performance compared to visual-only too…
Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang +2 more
The paper proposes AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly models the inherent asymmetry between visual and linguistic modalities, achieving si…
The paper introduces the 'readout-mediator angle' to demonstrate that simple linear probes, while capable of decoding information, often capture directions orthogonal to the model's actual causal comp…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more
The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…
The paper introduces Brain-IT-VQA, a novel framework that significantly improves visual question answering from fMRI signals, and presents NSD-VQA, a new, highly controlled dataset for this task.