~ similar to 2606.06106· 19 results
HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more
The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.
The paper introduces 'Routing Hijacking,' a severe attack where malicious clients forge semantic profiles in Federated RAG systems to misroute target queries, and proposes a trust-aware post-routing f…
Zhen Chen, Yibing Liu, Weihao Xie, Yu Liang +2 more
The paper proposes formulating RAG design as an architecture search problem and introduces RAISE, a comprehensive framework and benchmark for systematically optimizing RAG hyperparameters.
The paper systematically evaluates advanced retrieval-augmented generation (RAG) architectures for Cyber Threat Intelligence (CTI), demonstrating that a hybrid graph-text approach significantly improv…
Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim +11 more
The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 problems grounded in Korean contexts, demonstrating that current frontier LLMs struggle significantly with complex, context…
The paper proposes SPHERE, a novel framework that uses large language models to create semantic user personas, enabling effective cross-domain recommendation knowledge transfer between completely disj…
SkillPager is a novel two-stage framework that efficiently selects minimal, execution-sufficient context from large procedural skill documents by leveraging typed semantic nodes, significantly reducin…
Youquan Xian, Xueying Zeng, Lingjia Meng, Lei Cui +5 more
The paper proposes SATA, a semantics-aware traffic augmentation framework, to significantly improve the generalization of website fingerprinting models by addressing variability in resource compositio…
Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more
The paper introduces GEO-Bench, a unified benchmark that standardizes the evaluation of various generative engine optimization (GEO) ranking manipulation attacks, demonstrating that black-box content…
Ojas Nimase, Zhe Chen, Gengpei Qi, Yue Zhao +1 more
GEO-Bench introduces a standardized benchmark to compare various ranking manipulation attacks (both black-box and white-box) on generative engines, demonstrating that black-box content rewriting can b…
The paper introduces GTA, a scalable framework for generating realistic, multi-hop web-agent tasks with dense, executable trajectories, addressing the current lack of process-level supervision in web…
MEMENTO proposes a novel framework that treats the open web as a continuous learning signal, enabling agents to acquire task-specific expertise and reusable research strategies in low-data domains wit…
Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more
EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.
The paper introduces COPF, an online framework that ensures deployment-stable counterfactual fairness in link recommendation systems operating on evolving graphs by monitoring and controlling group di…
Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…
Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more
Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more
This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.
The paper introduces WebSP-Eval, a new framework to evaluate web agents on complex website security and privacy tasks, finding that current state-of-the-art models struggle significantly with stateful…
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…