~ similar to 2605.27906· 20 results
Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more
The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Yang Zhang, Xiaoshuai Sun, Rui Zhao, Wujin Sun +4 more
The paper proposes CSMR, a cognitive scheduling framework that allows a language model to dynamically decide when to acquire task-relevant visual evidence, significantly improving multimodal reasoning…
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
The paper introduces two methods, ermodel and ermodel, to significantly reduce hallucinations in clinical summarization by using hallucination detectors to guide iterative revisions and subsequently…
Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more
This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…
COFT is a training-free decoding method that significantly reduces societal biases in large language model chain-of-thought reasoning by applying token-level fairness control at decode time.
Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more
The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…
Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more
The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…
Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more
The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…
The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…
Mahtab Bigverdi, Lindsey Li, Weikai Huang, Yiming Liu +7 more
This paper introduces Imaginative Perception Tokens (IPT) to improve spatial reasoning in vision language models.
Mingkuan Zhao, Yide Gao, Wentao Hu, Suquan Chen +5 more
The paper proposes Resonant Context Anchoring (RCA), a lightweight, training-free method that enhances factual faithfulness in LLMs by dynamically amplifying the signal of external context evidence du…
Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin +4 more
The eMoT framework enhances multi-step reasoning in LLMs by treating reasoning as an evolving memory, stabilizing performance through symbolic computation and structured refinement.
The paper introduces Brain-IT-VQA, a novel framework that significantly improves visual question answering from fMRI signals, and presents NSD-VQA, a new, highly controlled dataset for this task.
Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia +4 more
DREAM-R is a novel framework that significantly enhances speculative reasoning in large multimodal models by optimizing draft generation alignment, introducing a robust verification mechanism, and ena…
The paper proposes a memory-augmented, three-stage agentic pipeline that significantly reduces LLM hallucinations and improves operational efficiency by integrating semantic caching and advanced obser…
This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…
The paper introduces Evidence-Carrying Agents (ECA) to prevent multimodal agents from executing privileged actions based on unsupported or hallucinated perceptual claims, achieving near-zero unsafe ex…
Seojeong Park, Jiho Choi, Junyong Kang, Seonho Lee +2 more
The paper addresses Perceptual Judgment Bias in multimodal LLM judges by introducing a new dataset and a unified training framework that forces models to prioritize visual evidence over plausible text…