~ similar to 2606.02406· 19 results
Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more
City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.
Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more
The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…
The paper proposes xModel-KD, a cross-modal knowledge distillation framework, to improve 3D point cloud segmentation by effectively transferring rich appearance cues from 2D images to sparse 3D geomet…
The paper shows that simple, non-architectural enhancements, such as adding semantic pseudo-labels and visibility information, can significantly boost Lidar Semantic Scene Completion performance.
The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…
The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…
The paper proposes an uncertainty-aware transfer learning framework using the Temporal Fusion Transformer (TFT) to achieve robust and scalable energy forecasting across different buildings, demonstrat…
PRIMA is a framework that significantly improves 3D quadruped mesh recovery by integrating biological knowledge and a test-time adaptation strategy, achieving state-of-the-art results on diverse and c…
CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.
Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer +2 more
The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more sem…
The paper introduces a Mixture-Density Representation (MDA) to model depth ambiguity, effectively eliminating 'flying-point' artifacts at object boundaries by allowing pixels to predict multiple possi…
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more
The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…
Sebastian Cavada, Soumava Paul, Tuan-Hung Vu, Andrei Bursuc +1 more
The paper introduces NewtPhys, a novel 4D dataset of real-world scenes with dense physical annotations, to systematically evaluate and reveal the limitations of foundation models in low-level Newtonia…