ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.00780· 18 results

cs.LGcs.AIRecentJun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →
cs.RORecentJun 3, 2026

Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Luca Zanatta, Grzegorz Malczyk, Kostas Alexis

This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.

View →
cs.LGcs.AIRecentMay 27, 2026

SPAR: Support-Preserving Action Rectification

Jiaxin Zhao, Weihang Pan, Xun Liang, Binbin Lin

SPAR introduces a novel framework that rectifies action policies by performing local fine-tuning in a residual space anchored to a pure behavior cloning policy, achieving state-of-the-art performance…

View →
cs.LGcs.AIRecentMay 28, 2026

In-Context Reward Adaptation for Robust Preference Modeling

Zhenyu Sun, Zheng Xu, Ermin Wei

The paper proposes In-Context Reward Adaptation, a transformer-based framework that uses in-context learning and auxiliary signals (like human response time) to robustly model diverse and unseen human…

View →
cs.LGcs.AIRecentMay 29, 2026

The Terminal Representation in Reinforcement Learning

Amir Esterhuysen, Anders Jonsson

The paper introduces the Terminal Representation (TR), a novel, lower-dimensional, and structurally distinct formulation for encoding reward-weighted trajectories in RL that bypasses the need for eige…

View →
cs.LGcs.AIRecentMay 30, 2026

Task diversity produces systematic transfer but inhibits continual reinforcement learning

Purab Seth, Neil Shah, Kunal Jha, Samuel J. Gershman +2 more

The paper introduces Banyan, a new continual reinforcement learning benchmark, demonstrating that while task diversity enables local transfer across distribution shifts, it does not guarantee sustaine…

View →
cs.AIRecentMay 28, 2026

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang +6 more

The paper introduces Metacognitive Memory Policy Optimization (MMPO), a novel memory training approach that optimizes LLM memory not based on final task success, but on minimizing epistemic uncertaint…

View →
cs.AIcs.LGRecentMay 30, 2026

Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief

Hongqiang Lin, Pengfei Wang, Nenggan Zheng

The paper introduces Posterior Hybrid Bayesian Belief (PhyB), a novel framework that reformulates policy optimization in Bayesian Offline RL by approximating expectations as a convex combination over…

View →
cs.AIRecentMay 30, 2026

Decoupled Behavioral Cloning for Scalable Inductive Generalization in RL from Specifications

Vignesh Subramanian, Subhajit Roy, Suguman Bansal

The paper proposes DIBS, a decoupled behavioral cloning approach that stabilizes inductive generalization in RL by separating task-specific policy learning from the evolution function, leading to impr…

View →
cs.MAcs.CRRecentApr 1, 2026

Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents

Dayong Ye, Tainqing Zhu, Congcong Zhu, Feng He +4 more

The paper proposes a comprehensive framework for LLM-based agent unlearning, enabling agents to selectively forget specific knowledge (states, trajectories, or environments) while maintaining performa…

View →
cs.AIRecentJun 1, 2026

TERRA: Task-Embedded Reasoning and Representation Architecture for Cross-Domain Applications

Shayan Shokri

The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…

View →
cs.LGcs.AImath.OCRecentMay 29, 2026

Agentic Transformers Provably Learn to Search via Reinforcement Learning

Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment…

View →
cs.LGcs.AIcs.CVRecentMay 27, 2026

OISD: On-Policy Internal Self-Distillation of Language Models

Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more

The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…

View →
cs.MAcs.AIcs.LGRecentMay 29, 2026

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Tomas Leroy-Stone

The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.

View →
cs.AIcs.LGstat.MLRecentJun 1, 2026

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

Zelin He, Haotian Lin, Boran Han, Wei Zhu +5 more

ReSkill is an RL-in-the-loop framework that reconciles skill creation and policy optimization by automatically creating, testing, and refining modular skills alongside the agent's policy learning, lea…

View →
cs.LGcs.AIRecentMay 28, 2026

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…

View →
cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →
cs.AIcs.CLRecentJun 1, 2026

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

Youwei Liu, Jian Wang, Hanlin Wang, Wenjie Li

COMAP introduces a novel co-evolutionary framework that simultaneously updates textual world models and agent policies through closed-loop interaction, significantly improving long-horizon decision-ma…

View →