~ similar to 2606.00294· 18 results
The paper proposes VISTA, a multi-level event semantics mining framework, to accurately predict complex events in long videos, addressing the limitations of current LLMs in this domain.
Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more
The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…
Jingjie Lin, Bingbing Wang, Zihan Wang, Zhengda Jin +3 more
The paper introduces RefMem-Bench, a new benchmark for measuring reflective memory in long-horizon dialogue, and proposes REMIND, a framework that significantly improves models' ability to synthesize…
Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang +8 more
The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temp…
This paper investigates improving speculative decoding for multilingual LLM inference, finding that n-gram draft models offer consistent speed-ups across languages despite lower token acceptance rates…
Kun Feng, Ziwei Shan, Yuchen Fang, Yiyang Tan +5 more
KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multi…
The study demonstrates that domain adaptation primarily reshapes the linguistic explanatory framework of language models, causing shifts in cosmological stance secondarily, rather than directly modify…
The paper introduces KnowledgeGain, a novel metric that measures the actual knowledge gained by readers from science news, and demonstrates its use in optimizing news generation to improve reader lear…
The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…
Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana +1 more
The paper demonstrates that using English prompts causes large language models to prioritize globally dominant narratives over local cultural knowledge, even when local evidence is provided.
Siyuan Qi, Xinyuan Wang, Yingxuan Yang, Haochuan Guo +4 more
DynaTree introduces a two-stage framework that pre-constructs a reusable retrieval tree offline using coordinated agents, allowing for efficient, structure-aware, and highly effective time-sensitive n…
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…
Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh +1 more
The paper introduces FRANZ, a communicative audit framework, to evaluate how LLMs frame responses to subjective questions, finding that LLMs exhibit statistically significant and coupled differences i…
The paper introduces CARTE, a new benchmark designed to test how well large language models understand fine-grained, regionally differentiated knowledge across the 13 metropolitan regions of France, r…
The paper develops a theoretically grounded framework for evaluating multilingual LLMs in Social Sciences and Humanities, moving beyond traditional NLP benchmarks to assess interpretive validity and c…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more
This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…