~ similar to 2606.00408· 20 results
Yongjie Wang, Xinyue Zhang, Kunhong Yao, Zhiwei Zeng +3 more
The paper introduces the concept of Search-Time Contamination (STC), demonstrating that deep research agents can leak information from public benchmarks via web search, leading to an overestimation of…
Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu +4 more
The paper introduces Harness-1, a search agent that separates semantic decision-making from state management by using a stateful search harness, achieving state-of-the-art performance across diverse r…
Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie +5 more
The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelli…
The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in lon…
The paper proposes a unified framework to evaluate how different types of memory transfer benefit multi-trajectory inference for tool-use LLM agents, finding that the optimal memory method depends cri…
Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen +6 more
The paper introduces Plan, a structured agentic behavior that decomposes multi-hop questions into ordered sub-questions before retrieval, and proposes a self-bootstrapping paradigm to train it without…
The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…
HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more
The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.
Critic-R introduces a novel framework that uses a critic model to provide natural language introspective feedback, significantly improving the performance of agentic search systems by optimizing retri…
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…
The paper introduces AURA, an LLM-powered mask-reconstruct framework, to improve text anonymization by enhancing resistance to agentic web-search re-identification while better preserving contextual u…
The paper introduces AURA, an LLM-powered mask-reconstruct framework, to improve text anonymization by enhancing resistance to agentic web-search re-identification while better preserving contextual u…
Dayong Ye, Tainqing Zhu, Congcong Zhu, Feng He +4 more
The paper proposes a comprehensive framework for LLM-based agent unlearning, enabling agents to selectively forget specific knowledge (states, trajectories, or environments) while maintaining performa…
The paper introduces Dr-CiK, a new benchmark designed to evaluate agents' ability to proactively discover, filter, and utilize relevant external context for time series forecasting, demonstrating that…
The paper introduces VibeSearchBench, a new benchmark designed to evaluate long-horizon, proactive search capabilities, demonstrating that current state-of-the-art LLM agents are still significantly i…
Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung +3 more
GrepSeek introduces a novel direct corpus interaction (DCI) search agent that trains an LLM to find and compose evidence from large text corpora by issuing executable shell commands, achieving state-o…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
This paper identifies Security-Recall Divergence (SRD), demonstrating that omission constraints (prohibitions) decay significantly in long-context LLM conversations, while commission constraints (requ…
Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more
The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…