~ similar to 2606.02268· 20 results
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more
The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…
The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer +2 more
The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more sem…
The paper introduces Geodesic Flow Matching, a manifold-aware denoising technique that adapts Riemannian transport dynamics to accurately clean high-dimensional structured representations like Spatial…
Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…
Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more
The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…
Panfei Cheng, Hongshan Yu, Wenrui Chen, Xiaojun Tang +2 more
The paper proposes a novel symmetry-aware, category-level method for 9D object pose estimation that accurately estimates translation and size first, followed by rotation, achieving state-of-the-art re…
This paper demonstrates that large language models spontaneously develop geometric structures corresponding to human perceptual domains (like color or pitch) within their internal layers, suggesting t…
Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi +2 more
The paper proposes Latent Geometric Chords (LGC) and LGC-H, a novel method that navigates decision boundaries using curvature-aware geometric search within a semantic manifold to generate high-fidelit…
Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more
The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…
Jiawei Li, Ziyi Liu, Weijie Shi, Long Chen +2 more
SSR3D-LLM introduces a structured spatial reasoning interface for unified 3D-LLMs, allowing fine-grained object grounding by generating and processing sequential latent spatial steps.
Shashi Kumar, Yacouba Kaloga, Petr Motlicek, Ina Kodrasi +1 more
The paper introduces Geometric Latent Reasoning (GLR), a method that models reasoning as continuous paths in the embedding space, showing that this continuous approach allows LLMs to solve problems us…
The paper proposes a Transformer-based end-to-end architecture to reconstruct 3D house roof wireframes from sparse point clouds and semantic data, achieving state-of-the-art results on the S23DR Chall…
The paper proposes xModel-KD, a cross-modal knowledge distillation framework, to improve 3D point cloud segmentation by effectively transferring rich appearance cues from 2D images to sparse 3D geomet…
GeoSAM-3D proposes a novel framework for open-vocabulary 3D scene segmentation from simple monocular video by propagating object prompts using a geodesic distance kernel on a reconstructed Gaussian sc…
Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang +1 more
The paper introduces PAR3D, a unified part-aware 3D-MLLM framework, to enhance 3D scene understanding by enabling models to reason about and ground both whole objects and their fine-grained parts.
VISReg introduces a novel regularization technique that combines variance control with a Sliced-Wasserstein-based sketching objective to stabilize self-supervised learning, achieving state-of-the-art…