Papers similar to 2606.00949

~ similar to 2606.00949· 20 results

cs.AIRecentMay 27, 2026

PIRS: Physics-Informed Reward Shaping for SAC-Based Building Energy Management

The paper introduces PIRS, a physics-informed reward shaping method that replaces ad-hoc comfort proxies with the ISO 7730 PMV formulation, enabling deep reinforcement learning agents to achieve energ…

View →

cs.AIRecentJun 1, 2026

Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings

Hallah Shahid Butt, Qiong Huang, Gökhan Demirel, Kevin Förderer +5 more

This paper proposes an Explainable Deep Reinforcement Learning (XRL) framework to optimize energy management in complex buildings, demonstrating that on-policy algorithms provide superior cost reducti…

View →

cs.LGcs.AIRecentMay 30, 2026

Interpretable Policy Distillation for Power Grid Topology Control

Aleksandra Dmitruka, Karlis Freivalds

This paper demonstrates that a complex deep reinforcement learning policy for power grid control can be successfully distilled into a lightweight, auditable decision tree and random forest surrogate t…

View →

cs.ROcs.AIcs.LGRecentJun 1, 2026

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…

View →

cs.CRcs.LGcs.MARecentApr 6, 2026

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang, Diksha Goel, Hussain Ahmad

The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…

View →

cs.LGcs.AIRecentJun 1, 2026

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

Azal Ahmad Khan, Ammar Ahmed, Zeshan Fayyaz, Sheng Di +2 more

The paper introduces Straggler-Aware Group Control (SAGC), a dynamic group-size controller that optimizes synchronous on-policy RL training by adapting group size to minimize delays caused by slow rol…

View →

cs.CRcs.LGRecentMar 24, 2026

Explainable Threat Attribution for IoT Networks Using Conditional SHAP and Flow Behavior Modelling

Samuel Ozechi, Jennifer Okonkwoabutu

This paper proposes an explainable threat attribution system for IoT networks that uses SHAP and flow behavior modeling to accurately classify and explain over 30 distinct attack variants into 8 meani…

View →

cs.ROcs.AIRecentJun 2, 2026

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou

The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…

View →

cs.AIcs.CReess.SYRecentMay 4, 2026

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang +4 more

The paper proposes a tool-mediated LLM architecture for autonomous cyber defense, formally proving its stability and demonstrating that it significantly reduces an attacker's expected payoff in real-w…

View →

cs.AIRecentJun 1, 2026

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen +3 more

The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.

View →

cs.GTcs.LGRecentJun 4, 2026

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…

View →

cs.CRcs.AIRecentMar 22, 2026

DeepXplain: XAI-Guided Autonomous Defense Against Multi-Stage APT Campaigns

Trung V. Phan, Thomas Bauschert

DeepXplain introduces an explainable deep reinforcement learning framework that enhances the trustworthiness and effectiveness of autonomous cyber defense against multi-stage APT campaigns by integrat…

View →

cs.RORecentJun 3, 2026

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

View →

cs.ROcs.AIRecentMay 29, 2026

DRL-Based Pose Control for Double-Ackermann Robots Under Actuation Uncertainties

Oussama Zaim, Mélodie Daniel, Aly Magassouba, Miguel Aranda +1 more

The paper proposes a robust sim-to-sim-to-real DRL approach to enable double-Ackermann robots to achieve full pose control despite significant actuation uncertainties and discrepancies between simulat…

View →

physics.flu-dyncs.AIcs.LGRecentMay 31, 2026

Emergent Transfer of a Physics Foundation Model from Simulation to Laboratory Turbulence

Payel Mukhopadhyay, Stefan S. Nixon, Romain Watteaux, Michael McCabe +19 more

The authors demonstrate that a physics foundation model, finetuned on simulation data, can successfully predict complex laboratory fluid dynamics, specifically resolving a long-standing discrepancy in…

View →

cs.LGcs.AIRecentMay 31, 2026

Physics-Informed Deep Learning for Entropy Prediction in Heterogeneous Systems: Thermodynamic and Information-Theoretic Case Studies

Biswajeet Sahoo, Debadutta Patra

The paper introduces a unified Physics-Informed Deep Learning (PIDL) framework that simultaneously enforces physical laws and information-theoretic bounds, demonstrating robust, domain-agnostic entrop…

View →

cs.LGcs.AIRecentMay 29, 2026

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu +3 more

The paper proposes DARTS, a distribution-aware active rollout trajectory shaping method that fundamentally accelerates LLM reinforcement learning by actively shaping the long-tail response distributio…

View →

cs.CRcs.AIRecentApr 9, 2026

Building Better Environments for Autonomous Cyber Defence

Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more

This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…

View →

cs.ROcs.AIcs.NERecentJun 4, 2026

Sample-efficient Low-level Motion Planning for Robotic Manipulation Tasks via Zero-shot Transfer Learning

Yuanzhi He, Victor Romero-Cano, José J. Patiño, Juan David Hernández +2 more

The paper proposes an iCEM+TL framework that combines the Sample-efficient Cross-Entropy Method with Transfer Learning and Reward Redesign to improve robotic motion planning for complex tasks like sta…

View →

cs.LGcs.AIRecentMay 29, 2026

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…

View →