Papers similar to 2605.28035

~ similar to 2605.28035· 10 results

cs.GRcs.AIcs.CVRecentMay 31, 2026

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

Zhicheng Zhang, Lei Wang, Yu Zhang, Yongsheng Gao

The paper proposes a sequence-alignment framework using Soft Dynamic Time Warping to evaluate audio-driven talking-head generation, demonstrating that this approach provides more robust and fair compa…

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →

cs.CLcs.AIRecentMay 27, 2026

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

Haechan Kim, Seungjun Chung, Inkyu Park, Jihoo Lee +1 more

The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) to evaluate SpeechLMs, demonstrating that English-centric evaluation fails to capture performance gaps…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…

View →

cs.AIRecentJun 1, 2026

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

Tianjiao Li, Kai Zhao, Xiang Li, Yang Liu +1 more

The paper introduces CASTER, a new human-centric task for evaluating User-Generated Content (UGC) resonance, and proposes MEDEA, an architecture that uses a Social Chain-of-Thought mechanism to simula…

View →

cs.SDcs.AIcs.CVRecentJun 1, 2026

JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

Jiashuo Yu, Yao Yao, Boyu Chen, Alex Wang

JenBridge is a novel, adaptive framework that generates high-fidelity, long-form video soundtracks, significantly improving narrative coherence and naturalness across scene transitions.

View →

cs.CRcs.AIcs.MMRecentMar 23, 2026

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…

View →

cs.AIcs.LGRecentMay 29, 2026

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins +2 more

TIGER is an inference-time framework that uses graph-based evidence routing to independently assess and repair unsupported facts (hallucinations) in multimodal generation.

View →

cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →

cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →