~ similar to 2605.29761· 20 results
Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more
The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…
Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more
The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…
The paper proposes xModel-KD, a cross-modal knowledge distillation framework, to improve 3D point cloud segmentation by effectively transferring rich appearance cues from 2D images to sparse 3D geomet…
Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more
The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…
GeoSAM-3D proposes a novel framework for open-vocabulary 3D scene segmentation from simple monocular video by propagating object prompts using a geodesic distance kernel on a reconstructed Gaussian sc…
Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more
City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.
T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.
Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more
The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.
The paper reframes industrial visual sim-to-real transfer as a domain-gap problem categorized by the availability of explicit object geometry (CAD), arguing that the required prior evidence dictates t…
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
The paper demonstrates a coordinated, cross-modal spoofing attack that successfully deceives state-of-the-art multi-sensor fusion systems in autonomous vehicles by making multiple sensors agree on a f…
PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…
The paper introduces a Mixture-Density Representation (MDA) to model depth ambiguity, effectively eliminating 'flying-point' artifacts at object boundaries by allowing pixels to predict multiple possi…
Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi +2 more
The paper proposes Latent Geometric Chords (LGC) and LGC-H, a novel method that navigates decision boundaries using curvature-aware geometric search within a semantic manifold to generate high-fidelit…
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Antonios Argyriou +2 more
The paper introduces a novel third-order, rotation-invariant spherical bispectrum for watermarking panoramic images, enabling reliable watermark embedding and extraction under arbitrary 3D rotations.