Papers similar to 2606.00121

~ similar to 2606.00121· 17 results

cs.SDcs.AIRecentMay 29, 2026

MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors

Guangyin Bao, Taiping Zeng, Jianfeng Feng, Xiangyang Xue

MindVoice is a neuro-to-speech framework that uses pretrained priors to disentangle and reconstruct intelligible speech from noisy, non-invasive neural signals, significantly outperforming existing me…

View →

cs.CVcs.AIq-bio.NCRecentMay 28, 2026

Brain-IT-VQA: From Brain Signals to Answers

Roman Beliy, Matias Cosarinsky, Oliver Heinimann, Navve Wasserman +1 more

The paper introduces Brain-IT-VQA, a novel framework that significantly improves visual question answering from fMRI signals, and presents NSD-VQA, a new, highly controlled dataset for this task.

View →

cs.AIRecentMay 28, 2026

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

Yizhuo Lu, Changde Du, Qingyu Shi, Hang Chen +4 more

Mind-Omni introduces a unified multi-task framework that models the interplay between brain, vision, and language signals using a discrete diffusion paradigm, achieving state-of-the-art performance ac…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…

View →

cs.LGcs.AIcs.CVRecentMay 28, 2026

Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

Hwa Hui Tew, Junn Yong Loo, Fang Yu Leong, Julia K. Lau +5 more

The paper introduces Dual-Spectral Flow Matching (DSFM), a novel generative framework that uses wavelet and cosine transforms to synthesize highly realistic, non-stationary fMRI time series for improv…

View →

cs.CLcs.LGRecentJun 1, 2026

Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time

Mingkuan Zhao, Yide Gao, Wentao Hu, Suquan Chen +5 more

The paper proposes Resonant Context Anchoring (RCA), a lightweight, training-free method that enhances factual faithfulness in LLMs by dynamically amplifying the signal of external context evidence du…

View →

cs.CVcs.CLRecentMay 29, 2026

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more

The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…

View →

cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →

cs.CVcs.CLRecentMay 31, 2026

Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning

Jixuan He, Xueting Li, Chieh Hubert Lin, Ming-Hsuan Yang

Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…

View →

cs.CVcs.LGRecentJun 1, 2026

Hallucination-Aware Diffusion Sampling for Inverse Problems via Robust Prior Updates

Pengfei Jin, Yiqi Tian, Kailong Fan, Bingjie Qi +1 more

The paper introduces Robust Prior Update (RPU), a module that improves the faithfulness of diffusion-based inverse solvers by stabilizing the prior update step, thereby reducing measurement-conditione…

View →

cs.CVcs.AIRecentMay 29, 2026

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more

This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue

Sherzod Hakimov, Mattia D'Agostini, Ivan Samodelkin, David Schlangen

The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…

View →

cs.CVcs.AIRecentMay 31, 2026

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more

The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…

View →

cs.AIRecentMay 28, 2026

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

Haoyuan Shi, Xiancong Ren, Yingji Zhang, Qinfan Zhang +8 more

VLA-Trace is a diagnostic framework that analyzes Vision-Language-Action (VLA) models by tracing their internal representations and external behaviors, revealing that while these models are good at vi…

View →

cs.CLcs.CVRecentJun 1, 2026

Mechanistic Diagnostics of Spatial Lexical Bias in Multimodal Large Language Model Spatial Reasoning

Chuang Ma, Qianying Liu, Tomoyuki Obuchi, Fei Cheng +5 more

The paper identifies a failure mode called spatial lexical bias in MLLMs, where adding a spatial word to options biases the model's choice, and demonstrates that this failure originates primarily from…

View →

cs.AIRecentMay 31, 2026

Brain-Atlas-Guided Generative Counterfactual Attention for Explainable Cognitive Decline Diagnosis Using Multimodal Connectomes

Xiongri Shen, Jiaqi Wang, Zhenxi Song, Yi Zhong +4 more

The paper proposes a novel Generative Counterfactual Attention-guided Network (GCAN) that uses multimodal connectomes and brain atlas knowledge to provide explainable and highly accurate diagnosis of…

View →