Papers similar to 2606.02068

~ similar to 2606.02068· 19 results

cs.GRcs.CVcs.LGRecentJun 3, 2026

Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting

The paper proposes a novel method to improve the simultaneous representation of appearance and geometry in 3D Gaussian Splatting by introducing an additional geometry opacity parameter.

View →

cs.CVRecentJun 1, 2026

VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning

Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more

VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…

View →

cs.CVcs.CRcs.LGRecentApr 14, 2026

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar, Ankit Gangwal +1 more

PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…

View →

cs.CVcs.AIRecentJun 3, 2026

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.

View →

cs.CVcs.AIRecentMay 29, 2026

Feature-Optimized Vision for Adaptive 3D Scene Reconstruction

Eric Liang

The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…

View →

cs.CVcs.AIRecentMay 29, 2026

Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion

Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more

Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…

View →

cs.AIRecentJun 1, 2026

Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization

Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita +1 more

The paper proposes a visual program synthesis framework using a VLM to generate accurate training data for semiconductor inspection, mitigating the sim-to-real gap by applying input binarization to st…

View →

cs.CVcs.AIeess.IVRecentJun 1, 2026

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more

The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVcs.AIRecentMay 30, 2026

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video

Arun Sharma

GeoSAM-3D proposes a novel framework for open-vocabulary 3D scene segmentation from simple monocular video by propagating object prompts using a geodesic distance kernel on a reconstructed Gaussian sc…

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →

cs.CVRecentJun 1, 2026

Neural Acquisition & Representation of Subsurface Scattering

Arjun Majumdar, Raphael Braun, Hendrik Lensch

The paper introduces a method using a U-Net CNN to acquire and estimate detailed sub-surface scattering properties by learning the pixel footprint response, enabling high-resolution relighting of obje…

View →

cs.CERecentMay 29, 2026

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

Xuanyi Liu, Deyi Ji, Liqun Liu, Lanyun Zhu +7 more

CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent…

View →

cs.CVcs.LGRecentJun 1, 2026

Hallucination-Aware Diffusion Sampling for Inverse Problems via Robust Prior Updates

Pengfei Jin, Yiqi Tian, Kailong Fan, Bingjie Qi +1 more

The paper introduces Robust Prior Update (RPU), a module that improves the faithfulness of diffusion-based inverse solvers by stabilizing the prior update step, thereby reducing measurement-conditione…

View →

cs.CVcs.CGRecentMay 28, 2026

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot +1 more

The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…

View →

cs.CVRecentJun 1, 2026

Retrieve What's Missing: Coverage-Maximizing Retrieval for Consistent Long Video Generation

Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more

The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…

View →

cs.CVcs.AIcs.GRRecentMay 28, 2026

City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2 more

City-Mesh3R is a scalable, end-to-end framework that reconstructs high-fidelity, watertight 3D surface meshes of entire city-scale environments directly from large collections of multi-view images.

View →

cs.CVRecentJun 2, 2026

PixVOD: Pixel-Distributed Direct Visual Odometry and Depth Estimation

Shinjeong Kim, Ignacio Alzugaray, Callum Rhodes, Paul H. J. Kelly +1 more

PixVOD proposes a fully parallelizable, pixel-distributed framework for visual odometry and depth estimation that performs computations directly on the sensor using Gaussian Belief Propagation.

View →