~ similar to 2605.31094· 19 results
The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…
This paper investigates the application of Parameter-Efficient Fine-Tuning (PEFT) methods, specifically adapters and LoRA, to large pretrained models for instance segmentation, demonstrating that thes…
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more
The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…
Places in the Wild introduces a massive, high-resolution RAW photograph dataset of 67,574 images captured in situ across 810 locations, providing unprecedented detail for ecologically valid vision res…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…
The paper proposes FedSAP, a framework that stabilizes federated prototype learning by delaying global alignment and enforcing inter-class structure, significantly improving representation quality und…
The paper proposes finetuning the Segment Anything Model (SAM) using large-scale synthetic fluorescence microscopy data to achieve robust and high-performing instance segmentation of mitochondria, add…
Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more
This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…
The paper proposes using Maximum Independent Set (MIS) algorithms on similarity graphs to select a maximally diverse and non-redundant subset of prompts for LLM benchmarking, achieving consistent rank…
Tengfei Zhang, Ziheng Zhao, Lisong Dai, Xiaoman Zhang +4 more
This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous ca…
ROVER is a lightweight, learnable plugin that efficiently routes and integrates object-centric visual evidence across multiple images and objects, significantly improving performance on grounded multi…
The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…
The paper argues that benchmarking Vision-Language Models (VLMs) for urban perception must treat human disagreement and non-response as key measurement outcomes, rather than assuming perfect consensus…
FLORO is a multimodal geospatial foundation model that learns transferable remote sensing representations from a small, diverse corpus, achieving strong performance across various sensor types and res…
The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…
The paper introduces PInVerify, an offline embodied benchmark for Active Instance Verification (AIV), a task requiring agents to actively select viewpoints to confirm if a candidate object matches a f…
The paper proposes AlignG, a method that learns context-conditioned predicate semantics by using prototype feedback to adapt relation representations based on image-specific evidence, significantly im…
Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more
The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…