~ similar to 2605.29539· 18 results
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
The paper proposes AlignG, a method that learns context-conditioned predicate semantics by using prototype feedback to adapt relation representations based on image-specific evidence, significantly im…
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
The paper proposes FedSAP, a framework that stabilizes federated prototype learning by delaying global alignment and enforcing inter-class structure, significantly improving representation quality und…
The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…
The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…
The paper proposes pretraining a Perceiver-style in-context learner on synthetic data to solve Multiple Instance Learning (MIL) tasks efficiently in the low-label regime.
Panfei Cheng, Hongshan Yu, Wenrui Chen, Xiaojun Tang +2 more
The paper proposes a novel symmetry-aware, category-level method for 9D object pose estimation that accurately estimates translation and size first, followed by rotation, achieving state-of-the-art re…
The paper introduces TGIF2, an extended dataset and benchmark that evaluates the forensic robustness of image forgery detection methods against modern, advanced text-guided inpainting techniques.
Yu Xue, Haoxuan Qu, Zhuoling Li, Yihang Lou +3 more
The paper introduces ToolFG, a novel tool-integrated MLLM framework that enhances fine-grained image classification by enabling models to autonomously use external tools to gather verifiable visual cu…
The paper introduces trust functions to filter weak supervision labels, enabling near-lossless weak-to-strong generalization by selectively training a strong student using only the most reliable weak…
BayesNCL introduces a probabilistic gating mechanism to resolve the optimization conflict in Contrastive Learning, leading to highly disentangled and semantically consistent representations.
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
Debopam Sanyal, Anantharaman Iyer, Alind Khare, Trisha Jain +4 more
KLAS introduces a novel framework that uses KL divergence to automatically select optimal pairs of pretrained models for stitching, significantly improving the accuracy-efficiency tradeoff of resultin…
The paper introduces Involuntary In-Context Learning (IICL), an effective few-shot pattern completion attack that can bypass safety alignments in large language models, achieving a 24.0% bypass rate a…
The paper introduces a novel two-stage framework to achieve robust, category-agnostic object localization in-context (ICL) by optimizing attention and minimizing localization error using reinforcement…
This study comparatively evaluates four CNN architectures (VGG16, ResNet50, EfficientNetB0, and XceptionNet) for fake image detection, finding VGG16 achieved the highest accuracy (91%).
The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…