~ similar to 2605.31069· 15 results
Yinsong Xu, Wei Jing, Liuxin Zhang, Wanjun Lv +1 more
The paper proposes a unified framework that decouples long-video reasoning into semantic and visual evidence, significantly improving performance on the HD-EPIC VQA Challenge.
Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang +8 more
The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temp…
Qixin Hu, Shuai Yang, Wei Huang, Song Han +1 more
LongLive-RAG proposes a novel Retrieval-Augmented Generation (RAG) framework to stabilize and improve the quality of long-horizon video generation by treating the entire generated history as a searcha…
The paper introduces and analyzes 'temporal framing,' defining it as the persuasive use of time-related language in news, and demonstrates that this framing can be effectively detected using supervise…
LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…
This paper demonstrates that fine-tuning small language models (SLMs) on a synthetic, solution-rich Windows event log dataset allows them to outperform larger LLMs in identifying issues and providing…
Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai +1 more
The paper proposes a training-free framework, Visual Representation-Guided Video-LLM Reasoning, to perform composed video retrieval by using visual examples and text instructions, achieving strong per…
The paper systematically evaluates concept-based explainability in MLLMs, finding that forcing models to generate formal explanations degrades predictive accuracy, suggesting that explaining is genuin…
Zhida Zhang, Jie Ma, Zhan Peng, Haoxue Wu +4 more
SmartDirector is a novel framework that significantly improves cinematic video generation by using multiple keyframes to provide precise control over narrative structure and temporal pacing.
The paper proposes a market-feedback adaptive retrieval system for frozen LLMs in financial RAG, significantly improving event-impact prediction by learning which evidence sources to prioritize.
The paper proposes BitTP, a lightweight bitlinear architecture that quantizes LLM-based trajectory predictors to 1.58-bit weights while keeping activations full-precision, enabling high-performance de…
Yue Feng, Jingjing Li, Qijia Lu, Wei Ji +8 more
This paper addresses the challenge of detecting and explaining AI-manipulated segments within long, untrimmed videos by proposing a new benchmark and a coarse-to-fine forensic detection framework.
SHARP proposes a novel sleep-based hierarchical replay framework to efficiently learn long-range non-stationary temporal patterns in streaming data, achieving improved context retention and predictive…
Hyeonjeong Ha, Jeonghwan Kim, Cheng Qian, Jiayu Liu +6 more
MemGuard introduces a type-aware memory framework to prevent heterogeneous memory contamination in long-term memory-augmented LLMs, significantly improving memory reliability and efficiency.
Hui Yang, Daiwei He, Kevin Jiang, Taejin Park +19 more
The paper introduces a novel paradigm where a fine-tuned LLM acts as an ancillary predictor to forecast likely advertisers, significantly improving ad recommendation systems by augmenting candidate ge…