ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02535· 17 results

cs.CVcs.AIRecentMay 30, 2026

Benchmarks for Vision-Language Models in Urban Perception Should Be Reliability-Aware and Negotiated

Rashid Mushkani

The paper argues that benchmarking Vision-Language Models (VLMs) for urban perception must treat human disagreement and non-response as key measurement outcomes, rather than assuming perfect consensus…

View →
cs.CVcs.AIcs.CLRecentJun 1, 2026

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue

Sherzod Hakimov, Mattia D'Agostini, Ivan Samodelkin, David Schlangen

The paper introduces the Image Reconstruction Game, a benchmark showing that the quality of the descriptive model is the primary determinant of image reconstruction success, while the generator's role…

View →
cs.CVcs.AIcs.CLRecentMay 31, 2026

On the Limits of Token Reduction for Efficient Unified Vision Language Training

Siyi Chen, Weiming Zhuang, Jingtao Li, Lingjuan Lv

The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…

View →
cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →
cs.CVcs.AIcs.GRRecentMay 31, 2026

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more

The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.

View →
cs.CVcs.CLRecentMay 29, 2026

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more

The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…

View →
cs.AIcs.MMcs.SDRecentMay 27, 2026

MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation

Haitian Li, Yanghao Zhou, Heyan Huang, Liangji Chen +14 more

The paper introduces MTAVG-Bench 2.0, a new benchmark designed to diagnose high-level failure modes of cinematic expressiveness in multi-talker audio-video generation, showing that even advanced model…

View →
cs.CVcs.AIRecentMay 28, 2026

GPIC: A Giant Permissive Image Corpus for Visual Generation

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang +5 more

The paper introduces GPIC, a massive, permissively licensed, and safety-filtered image corpus of 28 trillion pixels, designed to serve as a stable and accessible benchmark for large-scale visual gener…

View →
cs.CVcs.AIRecentMay 29, 2026

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

Erik Großkopf, Soumya Snigdha Kundu, Hendrik Möller, Nicolas Münster +8 more

The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…

View →
cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →
cs.CVRecentJun 1, 2026

Chroma Clues: Leveraging Color Statistics to Detect Synthetic Images

Lea Uhlenbrock, Davide Cozzolino, Christian Riess

This paper proposes using color statistics, specifically through novel color transformations, to detect AI-generated synthetic images by exploiting the color-imitation weaknesses of current generative…

View →
cs.CRRecentApr 12, 2026

SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits

Mengieong Hoi, Zhedong Zheng, Ping Liu, Wei Liu

The paper introduces SEED, a large-scale benchmark dataset for tracing sequential deepfake facial edits, and proposes FAITH, a frequency-aware Transformer model that effectively detects and orders the…

View →
cs.CVcs.AIRecentMay 28, 2026

Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

Soumyadeep Jana, Pulkit Mittal, Sanasam Ranbir Singh

The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…

View →
cs.CVcs.AIRecentMay 29, 2026

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Jiazheng Xing, Hangjie Yuan, Lingling Cai, Xinyu Liu +8 more

Lumos-Nexus is a training-efficient framework that enhances video generation quality by progressively bridging generation from a lightweight model to a high-fidelity generator in a shared latent space…

View →
cs.LGcs.AIstat.MLRecentMay 28, 2026

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan

The paper introduces CalArena, a large-scale, standardized benchmark covering nearly 2000 experiments to comprehensively evaluate post-hoc calibration methods, finding that smooth calibration function…

View →
cs.CVcs.CRcs.SIRecentMay 14, 2026

Can Visual Mamba Improve AI-Generated Image Detection? An In-Depth Investigation

Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdelmalik Taleb-Ahmed +2 more

This study systematically evaluates Vision Mamba models for detecting AI-generated images, finding that while they show promise, their current strengths and limitations must be understood relative to…

View →
cs.CVcs.AIEmpiricalRecentJun 10, 2026

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Cheng-Yu Yang, Shao-Yuan Lo, Yu-Lun Liu

肖代替了视觉令牌的永久删除,通过可恢复的路由来改进视觉语言模型的性能

View →