~ similar to 2605.30576· 20 results
SARAD proposes a novel safety-aware hybrid framework that combines Large Language Models (LLMs) and Deep Reinforcement Learning (DRL) to improve autonomous driving decision-making by replacing random…
The paper introduces a novel shielding framework for Robust MDPs (RMDPs) that guarantees safety under worst-case transition probabilities, enabling safe reinforcement learning even when transition dyn…
Xinyi Ning, Zilin Bian, Dachuan Zuo, Semiha Ergan +1 more
The paper proposes a Risk Horizon Profiling (RHP) module that uses a continuous potential field model to profile future risk distributions, significantly improving trajectory prediction accuracy in bo…
The paper proposes an algorithmic method using conformal prediction to formally certify high-probability safety for Belief-Space Neural Safety Filters (BeliefSF), significantly improving safety guaran…
This paper demonstrates that reasoning-enabled Vision-Language-Action (VLA) models for autonomous driving are highly vulnerable to realistic input perturbations, significantly compromising both reason…
This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…
The paper introduces Posterior Hybrid Bayesian Belief (PhyB), a novel framework that reformulates policy optimization in Bayesian Offline RL by approximating expectations as a convex combination over…
Qi Lan, Yining Tang, Yu Shen, Yi Zhou +3 more
RiskFlow is a novel framework that generates realistic and safety-critical multi-agent traffic scenarios by reformulating trajectory generation as a single-pass transport problem in the action space.
Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas +6 more
The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational r…
Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more
The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…
The paper proposes a novel Bayesian framework to learn the optimal decision strategy for the stochastic shortest path problem by directly constructing the posterior beliefs for the action-value functi…
The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple s…
Dongwook Choi, Taeyoon Kwon, Bogyung Jeong, Minju Kim +5 more
EMBGuard introduces a novel, MLLM-based safety guardrail that explicitly identifies and explains physical hazards from (visual observation, action) pairs, enabling safer planning for embodied agents.
Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more
The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.
The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…
Zhepei Hong, Lin Wang, Liting Li, Haokai Ma +4 more
The paper proposes TRACE, a trajectory risk-aware compression method, to effectively aggregate sparse and delayed safety evidence across long agent trajectories, achieving state-of-the-art performance…
Ting Xu, Xu He, Yupu Lu, Jiankai Sun +3 more
The paper analyzes the entropy dynamics of Chain-of-Thought (CoT) reasoning, identifying a transition from an exploratory Uncertainty Region to a stable Confidence Region, which enables superior early…
Tianle Zeng, Hanjing Ye, Jianwei Peng, Jingwen Yu +2 more
The paper proposes a memory-augmented, traversability-aware framework for outdoor VLN that maintains stable, goal-consistent guidance even when semantic cues are interrupted or unavailable.
LiSA introduces a conservative policy induction framework that enhances fixed AI guardrails by converting sparse, noisy failure reports into reusable, generalized policies, significantly improving saf…
Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji +3 more
SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expand…