~ similar to 2606.02204· 18 results
Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more
The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu +5 more
The paper proposes SCALE, a self-improving web agent framework that uses adversarial roles and graph exploration to autonomously discover agent limitations and enhance adaptability in complex web envi…
The paper introduces ARCA, a novel credit assignment method that measures token salience directly from the adapter's residual hidden state, addressing the degeneracy of standard intrinsic signals when…
The paper demonstrates that supervised fine-tuning significantly outperforms frontier zero-shot large language models for screen-conditioned action prediction on the PiSAR benchmark, highlighting the…
Qi Sun, Siyue Zhang, Yulin Chen, Yuxiang Xue +2 more
The paper proposes Preference Delta Aggregation (PDA), a framework that aggregates multiple weak preference signals derived from smaller model pairs using LoRA merging to significantly boost the perfo…
This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang +46 more
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex open-world agent deployments.
Dongrui Liu, Yu Li, Zhonghao Yang, Peng Wang +46 more
The paper introduces AgentDoG 1.5, a lightweight and scalable alignment framework that significantly improves AI agent safety and security for complex, open-world agentic scenarios.
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…
Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang +3 more
SkillAdaptor is a novel, training-free framework that enables stable, step-level adaptation of external skills for LLM agents by precisely attributing failures to specific skills.
Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more
The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…
Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang +2 more
The paper introduces Andes, a framework that treats data generation as a plug-and-play agent skill, enabling autonomous alignment of LLMs by providing an intelligent, closed-loop data synthesis interf…
Taiyi Su, Jian Zhu, Tianjian Wang, Youzhang He +8 more
DeMaVLA is a generalizable Vision-Language-Action foundation model designed for deformable object manipulation, achieving strong real-world performance on folding tasks by leveraging large-scale real-…
This study provides a comprehensive benchmark of 10 frontier LLMs on 200 offensive cybersecurity tasks, finding that environment tooling and model selection are the primary performance drivers, with C…