~ similar to 2606.00844· 19 results
The paper introduces a novel two-stage framework to achieve robust, category-agnostic object localization in-context (ICL) by optimizing attention and minimizing localization error using reinforcement…
CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.
The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…
The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…
Jiaxi Liu, Hangyu Li, Yang Cheng, Rui Gana +6 more
The paper proposes a pose-conditioned, permutation-equivariant denoiser to accurately reconstruct work zone geometry using noisy Ultra-Wideband (UWB) range data from connected and autonomous vehicles…
Panfei Cheng, Hongshan Yu, Wenrui Chen, Xiaojun Tang +2 more
The paper proposes a novel symmetry-aware, category-level method for 9D object pose estimation that accurately estimates translation and size first, followed by rotation, achieving state-of-the-art re…
Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao +2 more
GiPL proposes a novel two-branch framework combining iterative pseudo-label self-training and generative data augmentation to significantly improve Cross-Domain Few-Shot Object Detection by better uti…
The paper proposes FedSAP, a framework that stabilizes federated prototype learning by delaying global alignment and enforcing inter-class structure, significantly improving representation quality und…
The paper proposes xModel-KD, a cross-modal knowledge distillation framework, to improve 3D point cloud segmentation by effectively transferring rich appearance cues from 2D images to sparse 3D geomet…
The paper proposes Energy-Aware NECO, a single-pass hybrid detector that combines geometric ratio and logit-based energy scores to achieve superior pixel-wise out-of-distribution detection for semanti…
The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
The paper shows that simple, non-architectural enhancements, such as adding semantic pseudo-labels and visibility information, can significantly boost Lidar Semantic Scene Completion performance.
The paper proposes AlignG, a method that learns context-conditioned predicate semantics by using prototype feedback to adapt relation representations based on image-specific evidence, significantly im…
This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.
The paper demonstrates a coordinated, cross-modal spoofing attack that successfully deceives state-of-the-art multi-sensor fusion systems in autonomous vehicles by making multiple sensors agree on a f…
The paper introduces CalArena, a large-scale, standardized benchmark covering nearly 2000 experiments to comprehensively evaluate post-hoc calibration methods, finding that smooth calibration function…
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more
The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…
This paper investigates the application of Parameter-Efficient Fine-Tuning (PEFT) methods, specifically adapters and LoRA, to large pretrained models for instance segmentation, demonstrating that thes…