Papers similar to 2606.02168

~ similar to 2606.02168· 20 results

cs.CVcs.AIq-bio.NCRecentMay 28, 2026

Brain-IT-VQA: From Brain Signals to Answers

Roman Beliy, Matias Cosarinsky, Oliver Heinimann, Navve Wasserman +1 more

The paper introduces Brain-IT-VQA, a novel framework that significantly improves visual question answering from fMRI signals, and presents NSD-VQA, a new, highly controlled dataset for this task.

View →

cs.CVcs.CLRecentMay 30, 2026

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more

The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…

View →

cs.LGcs.AIRecentMay 27, 2026

Learning Compositional Latent Structure with Vector Networks

Niclas Pokel, Benjamin F. Grewe

The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…

View →

cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →

cs.CVcs.AIcs.MARecentMay 29, 2026

Seeing Before Agreeing: Aligning Multi-Agent Consensus with Visual Evidence

Yuhan Wang, Shuochen Chang, Yalin Feng, Dongsheng Ma +7 more

The paper proposes EAGLE, a novel evidence-aligned multi-agent framework, demonstrating that requiring shared visual evidence among agents is crucial for achieving reliable and trustworthy consensus i…

View →

cs.CVRecentJun 1, 2026

InsightVQA: High-Dimensional Emotion-Cognitive Visual Question Answering Benchmark

Shiyu Wang, Ziyu Liu, Chaoyi Yu, Yujie Yin +5 more

The paper introduces InsightVQA, a large-scale benchmark dataset designed for hierarchical visual question answering that assesses complex emotion understanding and cognitive reasoning beyond simple e…

View →

cs.CVcs.LGRecentJun 1, 2026

CORE-MTL: Rethinking Gradient Balancing via Causal Orthogonal Representations

Chengfeng Wu, Tao Zou, Yanru Wu, Jingge Wang

CORE-MTL proposes a representation-centric framework that uses causal orthogonal representations to disentangle task-relevant structure from nuisance variation in multi-task learning, achieving superi…

View →

cs.CVcs.AIRecentMay 28, 2026

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more

The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…

View →

cs.CVcs.AIcs.LGRecentMay 28, 2026

Learning Context-Conditioned Predicate Semantics via Prototype Feedback

NamGyu Jung, Chang Choi

The paper proposes AlignG, a method that learns context-conditioned predicate semantics by using prototype feedback to adapt relation representations based on image-specific evidence, significantly im…

View →

cs.CVcs.AIRecentMay 31, 2026

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more

The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…

View →

cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan, Rema Padman

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →

cs.CVcs.AIcs.CLRecentMay 27, 2026

VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning

Xingyu Lu, Jinpeng Wang, Yi-Fan Zhang, Yankai Yang +12 more

VCap introduces a novel Witness-Adjudicator reward mechanism that provides highly precise, factually grounded feedback for visual captioning, enabling state-of-the-art performance in RL-trained multim…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…

View →

cs.CVRecentJun 1, 2026

Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai +1 more

The paper proposes a training-free framework, Visual Representation-Guided Video-LLM Reasoning, to perform composed video retrieval by using visual examples and text instructions, achieving strong per…

View →

cs.CVcs.AIRecentMay 27, 2026

Bayesian Gated Non-Negative Contrastive Learning

Peng Cui, Jiahao Zhang, Lijie Hu

BayesNCL introduces a probabilistic gating mechanism to resolve the optimization conflict in Contrastive Learning, leading to highly disentangled and semantically consistent representations.

View →

cs.CLcs.LGRecentMay 31, 2026

Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention

Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma +4 more

This paper introduces interpretability-guided, training-free interventions that systematically improve the accuracy and controllability of latent reasoning in LLMs by leveraging structural and causal…

View →

cs.AIRecentMay 27, 2026

Reasoning Matters: Mitigate Hallucination in Multimodal Large Reasoning Models via Reasoning-Conditioned Preference Optimization

Jiawei Kong, Hao Fang, Shunxiang Liao, Jinyu Li +4 more

The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference o…

View →

cs.CVcs.AIcs.CLRecentMay 28, 2026

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Yue Zhang, Zun Wang, Han Lin, Yonatan Bitton +2 more

This paper introduces a new evaluation framework, SpatialUncertain, demonstrating that current Vision-Language Models (VLMs) are prone to overconfident and incorrect answers to spatial questions when…

View →

cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →

cs.CLcs.AIRecentMay 28, 2026

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Pierre-Antoine Lequeu, Camille Barboule, Benjamin Piwowarski

The paper proposes explicitly disentangling positional and semantic representations in Transformer encoders, demonstrating that this separation allows for a clearer understanding of how positional inf…

View →