Papers similar to 2606.05124

~ similar to 2606.05124· 18 results

cs.CVcs.AIRecentJun 1, 2026

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…

View →

cs.CVRecentJun 1, 2026

VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning

Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more

VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…

View →

cs.CVcs.CRcs.LGRecentApr 14, 2026

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar, Ankit Gangwal +1 more

PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…

View →

cs.CRcs.CVRecentMay 10, 2026

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more

The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…

View →

cs.CVcs.AIRecentMay 30, 2026

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video

Arun Sharma

GeoSAM-3D proposes a novel framework for open-vocabulary 3D scene segmentation from simple monocular video by propagating object prompts using a geodesic distance kernel on a reconstructed Gaussian sc…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVcs.AIcs.RORecentMay 28, 2026

Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes

Chenxi Tao, Seung-Kyum Choi

The paper reframes industrial visual sim-to-real transfer as a domain-gap problem categorized by the availability of explicit object geometry (CAD), arguing that the required prior evidence dictates t…

View →

cs.CVRecentJun 1, 2026

From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data

Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more

The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…

View →

cs.GRcs.CGRecentMay 30, 2026

Subgrid Marching Tetrahedra

Hossein Baktash, Mark Gillespie, Keenan Crane

The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…

View →

cs.CVcs.AIRecentJun 3, 2026

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.

View →

cs.CERecentMay 29, 2026

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

Xuanyi Liu, Deyi Ji, Liqun Liu, Lanyun Zhu +7 more

CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent…

View →

cs.CVcs.CGRecentMay 28, 2026

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot +1 more

The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…

View →

cs.CVcs.AIcs.GRRecentMay 28, 2026

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more

City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.

View →

cs.CVRecentJun 1, 2026

Retrieve What's Missing: Coverage-Maximizing Retrieval for Consistent Long Video Generation

Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more

The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…

View →

cs.CVcs.AIRecentMay 29, 2026

Feature-Optimized Vision for Adaptive 3D Scene Reconstruction

Eric Liang

The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…

View →

cs.CVRecentJun 1, 2026

Honey, I Shrunk the Arc de Triomphe!

Yuanbo Xiangli, Hanyu Chen, Xueqing Tsang, Noah Snavely

The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…

View →

cs.CVcs.AIRecentMay 28, 2026

GPIC: A Giant Permissive Image Corpus for Visual Generation

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang +5 more

The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…

View →

cs.AIRecentJun 1, 2026

Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization

Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita +1 more

The paper proposes a visual program synthesis framework using a VLM to generate accurate training data for semiconductor inspection, mitigating the sim-to-real gap by applying input binarization to st…

View →