~ similar to 2606.02346· 19 results
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
The paper proposes a novel method to improve the simultaneous representation of appearance and geometry in 3D Gaussian Splatting by introducing an additional geometry opacity parameter.
GeoSAM-3D proposes a novel framework for open-vocabulary 3D scene segmentation from simple monocular video by propagating object prompts using a geodesic distance kernel on a reconstructed Gaussian sc…
PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…
Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more
The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…
Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more
The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…
Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more
Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…
Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more
The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more
The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…
PRIMA is a framework that significantly improves 3D quadruped mesh recovery by integrating biological knowledge and a test-time adaptation strategy, achieving state-of-the-art results on diverse and c…
VISReg introduces a novel regularization technique that combines variance control with a Sliced-Wasserstein-based sketching objective to stabilize self-supervised learning, achieving state-of-the-art…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
Yuhang Han, Wenzheng Yang, Yujie Chen, Xiangqi Jin +3 more
STaR-KV introduces a novel, training-free KV cache compression framework that adaptively re-weights token importance across spatial, temporal, and distributional axes, significantly reducing GPU memor…
Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita +1 more
The paper proposes a visual program synthesis framework using a VLM to generate accurate training data for semiconductor inspection, mitigating the sim-to-real gap by applying input binarization to st…