Papers similar to 2606.00146

~ similar to 2606.00146· 18 results

cs.CVcs.LGRecentJun 1, 2026

Hallucination-Aware Diffusion Sampling for Inverse Problems via Robust Prior Updates

Pengfei Jin, Yiqi Tian, Kailong Fan, Bingjie Qi +1 more

The paper introduces Robust Prior Update (RPU), a module that improves the faithfulness of diffusion-based inverse solvers by stabilizing the prior update step, thereby reducing measurement-conditione…

View →

cs.CVcs.CLRecentMay 29, 2026

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more

The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…

View →

cs.CVcs.AIRecentMay 31, 2026

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

Gyojin Han, Junmo Kim

The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…

View →

eess.IVcs.AIRecentMay 28, 2026

A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging

Antony Jerald, Hemant K Aggarwal, Brian Nett, Avinash Gopal +3 more

The paper proposes a unified deep learning framework to synthesize contrast-phase-specific virtual monochromatic 50 keV images from single-energy CT (SECT) data, overcoming the hardware limitations of…

View →

cs.LGcs.CVRecentJun 1, 2026

Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging

Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more

This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…

View →

cs.CLcs.LGRecentJun 1, 2026

Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time

Mingkuan Zhao, Yide Gao, Wentao Hu, Suquan Chen +5 more

The paper proposes Resonant Context Anchoring (RCA), a lightweight, training-free method that enhances factual faithfulness in LLMs by dynamically amplifying the signal of external context evidence du…

View →

cs.LGcs.AIcs.CVRecentMay 28, 2026

TRACER: Persistent Regularization for Robust Multimodal Finetuning

Hesam Asadollahzadeh, Feng Liu, Christopher Leckie, Sarah M. Erfani

The paper introduces TRACER, a novel regularization framework that uses Weighted Moving Average (WMA) distillation to robustly finetune multimodal models, mitigating catastrophic forgetting and improv…

View →

cs.CVcs.AIRecentMay 28, 2026

Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity

Yizhuo Lu, Changde Du, Qiongyi Zhou, Liuyun Jiang +1 more

The paper proposes MindDiffuser, a two-stage framework that significantly improves image reconstruction from brain activity by combining semantic guidance from text-to-image models with structural ref…

View →

cs.CVcs.AIRecentMay 30, 2026

Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation

CholMin Kang, Jonghyun Chung, Amanpreet Kaurb, Nagesh Gulkotwarb +1 more

The paper proposes RAMP, a multi-corruption augmentation framework, which significantly improves the robustness and reliability of CT segmentation deep learning models when deployed in real-world, deg…

View →

cs.CVcs.AIcs.CRRecentMar 18, 2026

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen +2 more

Rel-Zero proposes a novel zero-watermarking technique that embeds invisible watermarks by exploiting the invariance of relational distances between image patches during AI editing, achieving superior…

View →

cs.CVcs.AIRecentJun 3, 2026

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →

cs.CVcs.AIcs.CLRecentMay 31, 2026

TECCI: Tricky Edits of Collected and Curated Images

Aishwarya Agrawal, Roy Hirsch, Yasumasa Onoe, Sherry Ben +1 more

The paper introduces TECCI, a novel and challenging benchmark dataset of 7550 image-edit pairs, and demonstrates that current state-of-the-art text-guided image editing models struggle significantly w…

View →

cs.AIRecentMay 28, 2026

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

Zihao Xue, Yan Wang, Zhen Bi, Long Ma +6 more

The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…

View →

cs.LGcs.AIcs.CVRecentMay 27, 2026

Geometry-Correct Diffusion Posterior Sampling with Denoiser-Pullback Curvature Guidance and Manifold-Aligned Damping

Seunghyeok Shin, Minwoo Kim, Dabin Kim, Hongki Lim

The paper introduces a novel diffusion posterior sampling method that stabilizes and accelerates data-consistent sampling by replacing hand-tuned guidance weights with a per-noise-level, curvature-gui…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue

Sherzod Hakimov, Mattia D'Agostini, Ivan Samodelkin, David Schlangen

The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…

View →

cs.CVcs.AIRecentMay 28, 2026

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…

View →

cs.CVcs.IRcs.LGRecentJun 4, 2026

A Vision-language Framework for Comparative Reasoning in Radiology

Tengfei Zhang, Ziheng Zhao, Lisong Dai, Xiaoman Zhang +4 more

This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous ca…

View →