ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.00408· 20 results

cs.CRcs.AIRecentJun 3, 2026

Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation

Yongjie Wang, Xinyue Zhang, Kunhong Yao, Zhiwei Zeng +3 more

The paper introduces the concept of Search-Time Contamination (STC), demonstrating that deep research agents can leak information from public benchmarks via web search, leading to an overestimation of…

View →
cs.AIcs.CLcs.IRRecentJun 1, 2026

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu +4 more

The paper introduces Harness-1, a search agent that separates semantic decision-making from state management by using a stateful search harness, achieving state-of-the-art performance across diverse r…

View →
cs.AIRecentMay 29, 2026

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie +5 more

The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelli…

View →
cs.AIRecentMay 28, 2026

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in lon…

View →
cs.AIRecentMay 27, 2026

When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

Xinzhe Li, Yaguang Tao

The paper proposes a unified framework to evaluate how different types of memory transfer benefit multi-trajectory inference for tool-use LLM agents, finding that the optimal memory method depends cri…

View →
cs.AIRecentMay 27, 2026

Plan Before Search: Search Agents Need Plan

Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen +6 more

The paper introduces Plan, a structured agentic behavior that decomposes multi-hop questions into ordered sub-questions before retrieval, and proposes a self-bootstrapping paradigm to train it without…

View →
cs.CRcs.LGRecentMay 24, 2026

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…

View →
cs.AIRecentMay 27, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

View →
cs.IRcs.AIRecentMay 30, 2026

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

Critic-R introduces a novel framework that uses a critic model to provide natural language introspective feedback, significantly improving the performance of agentic search systems by optimizing retri…

View →
cs.CLRecentMay 30, 2026

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more

The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…

View →
cs.AIcs.CLcs.LGRecentMay 29, 2026

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Dongxin Guo, Jikun Wu, Siu Ming Yiu

The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…

View →
cs.CRcs.CLRecentMay 29, 2026

LLM Anonymization Against Agentic Re-Identification

Ziwen Li, Jianing Wen, Tianshi Li

The paper introduces AURA, an LLM-powered mask-reconstruct framework, to improve text anonymization by enhancing resistance to agentic web-search re-identification while better preserving contextual u…

View →
cs.CRcs.CLRecentMay 29, 2026

LLM Anonymization Against Agentic Re-Identification

Ziwen Li, Jianing Wen, Tianshi Li

The paper introduces AURA, an LLM-powered mask-reconstruct framework, to improve text anonymization by enhancing resistance to agentic web-search re-identification while better preserving contextual u…

View →
cs.MAcs.CRRecentApr 1, 2026

Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents

Dayong Ye, Tainqing Zhu, Congcong Zhu, Feng He +4 more

The paper proposes a comprehensive framework for LLM-based agent unlearning, enabling agents to selectively forget specific knowledge (states, trajectories, or environments) while maintaining performa…

View →
cs.AIcs.LGRecentMay 27, 2026

Dr-CiK: A Testbed for Foresight-Driven Agents

Yihong Tang, Andrew Robert Williams, Arjun Ashok, Vincent Zhihao Zheng +5 more

The paper introduces Dr-CiK, a new benchmark designed to evaluate agents' ability to proactively discover, filter, and utilize relevant external context for time series forecasting, demonstrating that…

View →
cs.CLcs.AIRecentMay 27, 2026

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Xiaohongshu Inc

The paper introduces VibeSearchBench, a new benchmark designed to evaluate long-horizon, proactive search capabilities, demonstrating that current state-of-the-art LLM agents are still significantly i…

View →
cs.CLcs.AIcs.IRRecentMay 28, 2026

GrepSeek: Training Search Agents for Direct Corpus Interaction

Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung +3 more

GrepSeek introduces a novel direct corpus interaction (DCI) search agent that trains an LLM to find and compose evidence from large text corpora by issuing executable shell commands, achieving state-o…

View →
cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →
cs.CRcs.AIRecentApr 22, 2026

Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

Yeran Gamage

This paper identifies Security-Recall Divergence (SRD), demonstrating that omission constraints (prohibitions) decay significantly in long-context LLM conversations, while commission constraints (requ…

View →
cs.AIRecentMay 28, 2026

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more

The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…

View →