Agent-Sentry: Bounding LLM Agents via Execution Provenance
Agent-Sentry is a runtime defense system that bounds the execution of LLM agents by learning a profile of benign behavior, effectively blocking malicious injections while maintaining high compatibility with legitimate use.
Abstract
More Like ThisAgentic computing systems, while immensely capable, raise serious security, privacy, and safety concerns. A key issue is that the full set of functionalities offered by these systems, combined with their probabilistic execution flows, is not known beforehand. Given this lack of characterization, it is challenging to validate whether a system has successfully carried out the user's intended task or instead executed irrelevant actions, potentially as a consequence of compromise. We present \emph{Agent Sentry}, a runtime defense that learns a bound on an agent's benign execution from prior legitimate executions and flags any action that falls outside this bound. Agent Sentry layers three complementary checks: a structural classifier over the sequence of actions and the provenance of each function's arguments; a deterministic allowlist check over sensitive argument values; and an LLM judge, invoked only on the residual of actions where the first two checks cannot safely decide between a legitimate new request and a carefully crafted injection. We demonstrate the effectiveness of Agent Sentry in AgentDojo and AgentDyn by blocking 94.3\% of successful injections while allowing 95.1\% of benign executions, without modifying the agent, its tools, or the LLM.