~ similar to 2605.28232· 20 results
OccuReward introduces an LLM-guided framework and a Comfort Equity Index (CEI) to shape building energy rewards, demonstrating that iterative refinement significantly improves occupant comfort equity…
This paper proposes an Explainable Deep Reinforcement Learning (XRL) framework to optimize energy management in complex buildings, demonstrating that on-policy algorithms provide superior cost reducti…
This paper demonstrates that a complex deep reinforcement learning policy for power grid control can be successfully distilled into a lightweight, auditable decision tree and random forest surrogate t…
The paper proposes an energy-efficient drag reduction strategy for turbulent flows by combining Multi-Agent Deep Reinforcement Learning with SHAP-guided explainable deep learning, achieving superior p…
EnergyMamba proposes an uncertainty-aware, graph-enhanced selective state space model to significantly improve both the accuracy and reliability of energy consumption prediction by explicitly modeling…
The paper proposes an iCEM+TL framework that combines the Sample-efficient Cross-Entropy Method with Transfer Learning and Reward Redesign to improve robotic motion planning for complex tasks like sta…
Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas +6 more
The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational r…
The paper proposes Hysteretic Policy Optimization (HPO) and its adaptive variant (A-HPO) to stabilize reinforcement learning training in sparse-reward environments by better balancing positive and neg…
Hongru Hou, Tiehua Mei, Denghui Geng, Jinhui Huang +4 more
The paper proposes ProRL, an effective Reinforcement Learning framework that rectifies gradient estimation deficiencies to optimize proactive recommendation paths, significantly outperforming existing…
The paper proposes a feasible-reward-set framework to perform Inverse Reinforcement Learning (IRL) when data comes from multiple imperfect demonstrators, providing theoretical guarantees and practical…
The paper demonstrates that using Reinforcement Learning from Verifiable Rewards (RLVR) significantly improves small language models' functional correctness in code generation, particularly when combi…
Yujie Wang, Siwei Chen, Longzan Luo, Xinyi Liu +3 more
The paper proposes DARTS, a distribution-aware active rollout trajectory shaping method that fundamentally accelerates LLM reinforcement learning by actively shaping the long-tail response distributio…
CARE-RL introduces a framework combining protocol-aware reward generation and capability-aware optimization to effectively mitigate cross-domain conflicts in multi-domain reinforcement learning for LL…
The paper introduces a unified Physics-Informed Deep Learning (PIDL) framework that simultaneously enforces physical laws and information-theoretic bounds, demonstrating robust, domain-agnostic entrop…
The paper models how AI-driven data center demand stresses the electrical grid, finding that relying solely on renewable energy certificates (RECs) is insufficient and that on-site storage and spatial…
The paper proposes an uncertainty-aware transfer learning framework using the Temporal Fusion Transformer (TFT) to achieve robust and scalable energy forecasting across different buildings, demonstrat…
Wangyi Mei, Zhouhong Gu, Zhenhan Bai, Yin Cai +8 more
The paper proposes Deep Research as Rubric (DR-rubric), a novel evidence-driven framework that treats rubric construction itself as a research problem to generate fine-grained, scalable reward signals…
The paper introduces a U-Net deep learning surrogate model to accelerate Quality-Diversity optimization for urban layout design, demonstrating that this spatial approach enables highly accurate climat…
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
The paper proposes S3TS, a novel tree search algorithm that simultaneously handles both non-linear system models and explicit uncertainties (scenarios) for advanced energy planning, achieving near-opt…