~ similar to 2605.31534· 20 results
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
The paper reframes industrial visual sim-to-real transfer as a domain-gap problem categorized by the availability of explicit object geometry (CAD), arguing that the required prior evidence dictates t…
PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…
Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more
The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…
Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…
Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more
City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.
Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen +6 more
The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning perf…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more
The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…
The paper proposes a real-time, predictive, and task-aware foveated imaging system that dynamically allocates limited sensor bandwidth to task-relevant regions of interest, significantly improving per…
The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…
TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…
RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…
PixVOD proposes a fully parallelizable, pixel-distributed framework for visual odometry and depth estimation that performs computations directly on the sensor using Gaussian Belief Propagation.
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more
VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…
The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more
The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…
Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more
The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…
The paper argues that benchmarking Vision-Language Models (VLMs) for urban perception must treat human disagreement and non-response as key measurement outcomes, rather than assuming perfect consensus…