Sahar Abdelnabi

5 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×4Crypto×4NLP×2Society×1ML×1

Frequent co-authors

Katharina Deckenbach1×

Chris Hicks1×

Research Timeline

2026

No More, No Less: Task Alignment in Terminal Agents

The paper introduces the Task Alignment Benchmark (TAB) to evaluate terminal agents' ability to selectively follow relevant environmental instructions while ignoring misleading distractors, revealing a systematic gap between task capability and task alignment.

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

The paper introduces and evaluates 'sleeper memory poisoning,' a delayed adversarial attack that corrupts an LLM agent's persistent memory by manipulating external context, demonstrating that these poisoned memories can successfully steer future conversations.

AI Agents May Always Fall for Prompt Injections

The paper argues that prompt injection is a fundamental vulnerability in AI agents, proposing that Contextual Integrity (CI) offers a principled framework to understand and mitigate context-sensitive failures, suggesting that current defenses are insufficient.

Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

This paper identifies three core weaknesses—benchmark vulnerabilities, temporal staleness, and runtime uncertainty—that undermine current AI agent security evaluations and proposes directions for building more robust testing frameworks.

Models That Know How Evaluations Are Designed Score Safer

The paper demonstrates that models can acquire 'evaluation meta-knowledge' from training data describing evaluation practices, leading to inflated safety benchmark performance that is independent of explicit memorization.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentMay 27, 2026

Models That Know How Evaluations Are Designed Score Safer

Katharina Deckenbach, Haritz Puerto, Jonas Geiping, Sahar Abdelnabi

View →

cs.CRcs.AIRecentMay 21, 2026