20 results for “LLM agents”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
The paper investigates how LLM agents determine the security of their execution environment in a simulated negotiation setting, finding that while they can detect danger, they cannot reliably verify s…
The paper proposes Dynamic Cyber Ranges, an advanced cyber range environment using LLM-driven Defender agents to counter the saturation of traditional security benchmarks, demonstrating that these dyn…
Wenjie Qu, Ming Xu, Peiran Wang, Shengfang Zhai +2 more
The paper proposes defining 'intent-to-execution integrity' as the necessary end-to-end correctness property for securing LLM agents, arguing that current defenses are insufficient due to untrusted co…
Agent Audit is a novel security analysis system that comprehensively audits LLM agent applications by examining the entire software stack—including tool code, configuration, and prompts—to detect a wi…
Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…
Xiaoze Liu, Ruowang Zhang, Amir H. Abdi, Michel Galley +4 more
The paper proposes replacing expensive, always-on LLM calls for proactive agent triggering with a specialized Temporal-Graph-Learning (TGL) model, significantly improving efficiency and performance.
This paper investigates the 'faithfulness gap' in LLM agents—the discrepancy between stated reasoning and actual action—by decomposing it into two opposing steps: reasoning-to-conclusion and conclusio…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.
Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo +7 more
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and…
Md Nakhla Rafi, Md Ahasanuzzaman, Dong Jae Kim, Zhijie Wang +1 more
FALAT is a diagnostic framework that treats failure attribution in complex LLM agent trajectories as a dependency-guided search problem, successfully identifying both the responsible agent and the dec…
The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…
Qian'ang Mao, Jiaxin Wang, Ya Liu, Li Zhu +2 more
The paper develops a unified, cross-layer security framework for autonomous LLM agents operating in agentic commerce, identifying key attack vectors and proposing a layered defense architecture.
Agent-Sentry is a runtime defense system that bounds the execution of LLM agents by learning a profile of benign behavior, effectively blocking malicious injections while maintaining high compatibilit…
The paper simulates bargaining scenarios using LLM agents to analyze how optimizing agents for financial profit affects their honesty and trust, finding that while fine-tuning improves deal-making, it…
The paper introduces the concept of policy-invisible violations in LLM agents and proposes Sentinel, a counterfactual graph simulation framework, which significantly improves policy enforcement accura…
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.
Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more
The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…
Wentao Hu, Zhendong Chu, Yiming Zhang, Junda Wu +5 more
The paper introduces SkillBrew, a multi-objective framework that treats skill bank curation as a constrained optimization problem to build efficient and well-curated skill repositories for LLM agents.