Papers similar to 2605.29243

~ similar to 2605.29243· 19 results

cs.CLcs.CRRecentMay 18, 2026

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

Maciej Chrabąszcz, Aleksander Szymczyk, Marcin Sendera, Tomasz Trzciński +1 more

The paper introduces 'probe trajectories'—a continuous measure of a concept's probability across a model's reasoning process—to improve the monitoring of Large Reasoning Models' future behavior, showi…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.CRRecentApr 17, 2026

Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Cedric Bonhomme, Alexandre Dulaunoy

The paper investigates forecasting sparse and bursty vulnerability sightings, concluding that traditional time-series models like SARIMAX are inadequate, and count-based methods like Poisson regressio…

View →

cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →

cs.CRcs.AIRecentMar 28, 2026

SafetyDrift: Predicting When AI Agents Cross the Line Before They Actually Do

Aditya Dhodapkar, Farhaan Pishori

The paper introduces SafetyDrift, a predictive model that forecasts when AI agents will violate safety protocols by analyzing the cumulative risk across sequences of individually safe actions.

View →

cs.CLcs.AIcs.CRRecentMay 7, 2026

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Wang +5 more

The paper introduces TurnGate, a response-aware defense mechanism that detects the earliest turn in a multi-turn dialogue where the accumulated interaction enables a harmful action, significantly impr…

View →

cs.AIcs.CLcs.CRRecentMay 30, 2026

Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults

Rana Muhammad Usman

The paper demonstrates that the order and content of external information (the 'feed') an LLM agent consumes before making a decision can significantly and causally steer its final choice, often overr…

View →

cs.AIcs.CLcs.CRRecentMay 30, 2026

Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults

Rana Muhammad Usman

The paper demonstrates that the sequence and composition of external information (the 'feed') an LLM agent consumes can significantly and causally steer its final decisions, often overriding its defau…

View →

cs.CRcs.AIRecentMay 18, 2026

Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

Doohee You

The paper proposes the Triple-tier Anomaly Defense (TRIAD) framework, a predictive model that treats safety verification as a dynamic trajectory problem to detect cumulative, cross-modal poisoning in…

View →

cs.LGcs.CYRecentJun 1, 2026

Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

Ashwin Singh, Carlos Castillo

The paper investigates predictive multiplicity and arbitrariness in recidivism risk assessment, finding that similarly accurate models often exhibit high predictive agreement, and proposes a simple po…

View →

cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan, Rema Padman

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →

cs.CLcs.AIcs.CRRecentMay 13, 2026

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

Michael S. Lee, Yash Maurya, Drew Rein, Bert Herring +12 more

The paper introduces ROK-FORTRESS, a novel bilingual, culturally adversarial benchmark that demonstrates that LLM safety behavior in high-stakes scenarios is significantly shaped by the interaction be…

View →

cs.AIRecentJun 1, 2026

Bridging the Last Mile of Time Series Forecasting with LLM Agents

Yuhua Liao, Zetian Wang, Qiangqiang Nie, Zhenhua Zhang

The paper introduces an LLM-agent framework to solve the 'last-mile forecasting' problem, bridging the gap between raw statistical predictions and business-ready forecasts by incorporating weakly stru…

View →

cs.CRcs.AIcs.LGRecentMay 21, 2026

TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting

Quang Duc Nguyen, Siyuan Liang, Yiming Li, Fushuo Huo +1 more

The paper proposes TimeGuard, a novel channel-wise pool training defense, to significantly improve the robustness of time series forecasting against backdoor attacks by addressing signal dilution and…

View →

cs.AIcs.CLcs.CRRecentApr 27, 2026

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

Hikmat Karimov, Rahid Zahid Alekberli

The paper proposes a novel information-geometric framework to analyze LLM stability by integrating task utility, external entropy, and internal structural proxies, showing this composite score improve…

View →

cs.CLcs.CRRecentMay 4, 2026

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

Mario Rodríguez Béjar, Francisco J. Cortés-Delgado, S. Braghin, Jose L. Hernández-Ramos

ContextualJailbreak introduces an evolutionary red-teaming strategy that performs automated search over simulated multi-turn primed dialogues, achieving high jailbreak rates across multiple state-of-t…

View →

cs.CLcs.AIRecentMay 31, 2026

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more

The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…

View →

cs.CYcs.AIRecentMay 28, 2026

AI Loss of Control Incident Management: Response & Resilience

Ross Gruetzemacher

This paper introduces a foundational framework and taxonomy for managing catastrophic AI loss of control (LOC) incidents, providing a proportional guide for response based on the severity and recovera…

View →

cs.LGcs.CRRecentJun 1, 2026

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

Yiran Qiao, Jing Chen, Jiaqi Xu, Yang Liu +2 more

The paper proposes a novel framework, LPCD, that uses latent causal modeling to robustly assess evolving adversarial risks in live streaming by decoupling malicious intent from superficial tactical sh…

View →