~ similar to 2606.02331· 19 results
The paper proposes a measurement-geometry framework to quantify how well fixed measurement operators can distinguish between images generated by a prior, thereby guiding the design of more trustworthy…
The paper introduces a novel diffusion posterior sampling method that stabilizes and accelerates data-consistent sampling by replacing hand-tuned guidance weights with a per-noise-level, curvature-gui…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more
The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…
Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more
The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…
Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more
Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…
The paper proposes a unified, constrained optimization framework using KL divergence and likelihood constraints to achieve effective and principled unlearning in diffusion models.
Jiawei Kong, Hao Fang, Shunxiang Liao, Jinyu Li +4 more
The paper proposes Reasoning-Conditioned Direct Preference Optimization (RC-DPO) to effectively mitigate hallucinations in multimodal large reasoning models by explicitly conditioning the preference o…
The paper demonstrates that off-the-shelf image diffusion models, like Stable Diffusion, can be repurposed to generate synthetic structured data, posing a threat of ground truth drift in closed eviden…
Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more
The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
Yizhuo Lu, Changde Du, Qiongyi Zhou, Liuyun Jiang +1 more
The paper proposes MindDiffuser, a two-stage framework that significantly improves image reconstruction from brain activity by combining semantic guidance from text-to-image models with structural ref…
Ultra Diffusion Poser is a novel diffusion model that improves human motion tracking from sparse IMUs and UWB ranging by explicitly modeling the geometric constraints imposed by inter-sensor distances…
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…
Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins +2 more
TIGER is an inference-time framework that uses graph-based evidence routing to independently assess and repair unsupported facts (hallucinations) in multimodal generation.
Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan +4 more
The paper proposes AEGIS, a novel diffusion-guided method for injecting adversarial perturbations into the latent space to create generalizable and robust defenses against advanced facial deepfake man…