~ similar to 2606.00949· 20 results
The paper introduces PIRS, a physics-informed reward shaping method that replaces ad-hoc comfort proxies with the ISO 7730 PMV formulation, enabling deep reinforcement learning agents to achieve energ…
This paper proposes an Explainable Deep Reinforcement Learning (XRL) framework to optimize energy management in complex buildings, demonstrating that on-policy algorithms provide superior cost reducti…
This paper demonstrates that a complex deep reinforcement learning policy for power grid control can be successfully distilled into a lightweight, auditable decision tree and random forest surrogate t…
The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…
The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…
Azal Ahmad Khan, Ammar Ahmed, Zeshan Fayyaz, Sheng Di +2 more
The paper introduces Straggler-Aware Group Control (SAGC), a dynamic group-size controller that optimizes synchronous on-policy RL training by adapting group size to minimize delays caused by slow rol…
This paper proposes an explainable threat attribution system for IoT networks that uses SHAP and flow behavior modeling to accurately classify and explain over 30 distinct attack variants into 8 meani…
The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…
Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang +4 more
The paper proposes a tool-mediated LLM architecture for autonomous cyber defense, formally proving its stability and demonstrating that it significantly reduces an attacker's expected payoff in real-w…
Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen +3 more
The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.
The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…
DeepXplain introduces an explainable deep reinforcement learning framework that enhances the trustworthiness and effectiveness of autonomous cyber defense against multi-stage APT campaigns by integrat…
Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more
This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.
Oussama Zaim, Mélodie Daniel, Aly Magassouba, Miguel Aranda +1 more
The paper proposes a robust sim-to-sim-to-real DRL approach to enable double-Ackermann robots to achieve full pose control despite significant actuation uncertainties and discrepancies between simulat…
The authors demonstrate that a physics foundation model, finetuned on simulation data, can successfully predict complex laboratory fluid dynamics, specifically resolving a long-standing discrepancy in…
The paper introduces a unified Physics-Informed Deep Learning (PIDL) framework that simultaneously enforces physical laws and information-theoretic bounds, demonstrating robust, domain-agnostic entrop…
Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu +3 more
The paper proposes DARTS, a distribution-aware active rollout trajectory shaping method that fundamentally accelerates LLM reinforcement learning by actively shaping the long-tail response distributio…
Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more
This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…
The paper proposes an iCEM+TL framework that combines the Sample-efficient Cross-Entropy Method with Transfer Learning and Reward Redesign to improve robotic motion planning for complex tasks like sta…
Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more
The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…