~ similar to 2605.30344· 18 results
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
ChronosAD introduces a novel architecture that uses time series foundation models and a custom Temporal Block to achieve robust and highly accurate anomaly detection across diverse domains.
Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang +8 more
The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temp…
Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more
The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…
Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more
This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…
The paper addresses the difficulty of using general vision-language models (VLMs) for fine-grained driver behavior recognition by creating a new, richly described dataset and demonstrating that fine-t…
Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li +4 more
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language…
The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…
The paper systematically evaluates concept-based explainability in MLLMs, finding that forcing models to generate formal explanations degrades predictive accuracy, suggesting that explaining is genuin…
The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…
The paper introduces GRIDS, a framework using Local Intrinsic Dimensionality (LID) to detect anomalies in self-supervised speech model representations, showing that LID elevation correlates with ASR d…
Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more
This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…
The paper introduces two methods, ermodel and ermodel, to significantly reduce hallucinations in clinical summarization by using hallucination detectors to guide iterative revisions and subsequently…
Yuefeng Peng, Mingzhe Li, Kejing Xia, Renhao Zhang +1 more
This paper presents the first systematic study of membership inference attacks (MIAs) against Vision-Language-Action (VLA) models, demonstrating that these models are highly vulnerable to privacy brea…
The paper introduces Rate Matching Consistency Training (RMCT), a novel method that improves model robustness against extraneous input cues without forcing the model to ignore those cues, thus preserv…
The paper introduces a robust, two-part framework (HyPE and HyPS) using hyperbolic geometry to efficiently detect and sanitize malicious prompts targeting Vision-Language Models (VLMs).
The paper proposes the Triple-tier Anomaly Defense (TRIAD) framework, a predictive model that treats safety verification as a dynamic trajectory problem to detect cumulative, cross-modal poisoning in…
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…