Papers similar to 2605.31534

~ similar to 2605.31534· 20 results

cs.CVcs.AIRecentJun 1, 2026

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…

View →

cs.CVcs.AIcs.RORecentMay 28, 2026

Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes

Chenxi Tao, Seung-Kyum Choi

The paper reframes industrial visual sim-to-real transfer as a domain-gap problem categorized by the availability of explicit object geometry (CAD), arguing that the required prior evidence dictates t…

View →

cs.CVcs.CRcs.LGRecentApr 14, 2026

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar, Ankit Gangwal +1 more

PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…

View →

cs.CVRecentJun 1, 2026

Retrieve What's Missing: Coverage-Maximizing Retrieval for Consistent Long Video Generation

Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more

The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…

View →

cs.CVcs.CLRecentMay 31, 2026

Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning

Jixuan He, Xueting Li, Chieh Hubert Lin, Ming-Hsuan Yang

Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…

View →

cs.CVcs.AIcs.GRRecentMay 28, 2026

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more

City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.

View →

cs.AIcs.CVcs.RORecentMay 28, 2026

Planning with the Views via Scene Self-Exploration

Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen +6 more

The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning perf…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVcs.AIRecentMay 28, 2026

VLM3: Vision Language Models Are Native 3D Learners

Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more

The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…

View →

cs.CVRecentJun 1, 2026

Policy-based Foveated Imaging and Perception

Howard Xiao, Jan Ackermann, Boyang Deng, Gordon Wetzstein

The paper proposes a real-time, predictive, and task-aware foveated imaging system that dynamically allocates limited sensor bandwidth to task-relevant regions of interest, significantly improving per…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue

Sherzod Hakimov, Mattia D'Agostini, Ivan Samodelkin, David Schlangen

The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…

View →

cs.CVRecentJun 1, 2026

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

Jinpeng Liu, Yukang Xu, Yutong Li, Xingyu Liu

TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…

View →

cs.CVRecentJun 2, 2026

PixVOD: Pixel-Distributed Direct Visual Odometry and Depth Estimation

Shinjeong Kim, Ignacio Alzugaray, Callum Rhodes, Paul H. J. Kelly +1 more

PixVOD proposes a fully parallelizable, pixel-distributed framework for visual odometry and depth estimation that performs computations directly on the sensor using Gaussian Belief Propagation.

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →

cs.CVRecentJun 1, 2026

VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning

Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more

VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…

View →

cs.CVcs.AIRecentMay 29, 2026

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

Erik Großkopf, Soumya Snigdha Kundu, Hendrik Möller, Nicolas Münster +8 more

The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…

View →

cs.CVcs.AIRecentMay 28, 2026

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more

The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…

View →

cs.CVcs.RORecentJun 1, 2026

Not All Points Are Equal: Uncertainty-Aware 4D LiDAR Scene Synthesis

Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more

The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…

View →

cs.CVcs.AIRecentMay 30, 2026

Benchmarks for Vision-Language Models in Urban Perception Should Be Reliability-Aware and Negotiated

Rashid Mushkani

The paper argues that benchmarking Vision-Language Models (VLMs) for urban perception must treat human disagreement and non-response as key measurement outcomes, rather than assuming perfect consensus…

View →