cs.CRcs.AI

Agent-Sentry: Bounding LLM Agents via Execution Provenance

Rohan Sequeira, Stavros Damianakis, Umar Iqbal, Konstantinos Psounis

Mar 24, 2026(revised May 8, 2026)

AI Summarygemma4:e4b

Agent-Sentry is a runtime defense system that bounds the execution of LLM agents by learning a profile of benign behavior, effectively blocking malicious injections while maintaining high compatibility with legitimate use.

Abstract

More Like This

Agentic computing systems, while immensely capable, raise serious security, privacy, and safety concerns. A key issue is that the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is challenging to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. We present \emph{Agent Sentry}, a runtime defense that learns a bound on an agent's benign execution from prior legitimate executions and flags any action that falls outside this bound. Agent Sentry layers three complementary checks: a structural classifier over the sequence of actions and the provenance of each function's arguments; a deterministic allowlist check over sensitive argument values; and an LLM judge, invoked only on the residual of actions where the first two checks cannot safely decide between a legitimate new request and a carefully crafted injection. We demonstrate the effectiveness of Agent Sentry in AgentDojo and AgentDyn by blocking 94.3\% of successful injections while allowing 95.1\% of benign executions, without modifying the agent, its tools, or the LLM.