~ similar to 2605.28732· 20 results
This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…
Hyeonjeong Ha, Jeonghwan Kim, Cheng Qian, Jiayu Liu +6 more
MemGuard introduces a type-aware memory framework to prevent heterogeneous memory contamination in long-term memory-augmented LLMs, significantly improving memory reliability and efficiency.
Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang +1 more
MemHint is a neuro-symbolic static analysis pipeline that significantly improves memory leak detection in C/C++ by combining LLM semantic understanding with Z3 symbolic reasoning, detecting more leaks…
Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang +4 more
MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over…
Jiajie Fu, Junwen Chen, Mengzhao Wang, Aoxiang He +4 more
The paper introduces VikingMem, a novel Memory Base Management System that effectively manages the persistent state of long-term LLM interactions by selectively extracting, evolving, and compressing m…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces REBench, a comprehensive, standardized benchmark dataset designed to enable fair and rigorous evaluation of Large Language Models (LLMs) on complex binary reverse engineering task…
Haobo Zhang, Xutao Mao, Guangyuan Dong, Ziwei Li +4 more
MemMark introduces a state-evolution attribution watermark that embeds owner-controlled signals into latent memory-write decisions, enabling robust provenance tracking for agent memory even when all t…
The paper introduces MemCog, a Memory-as-Cognition system that integrates memory access directly into the reasoning process, significantly improving agent performance, especially in proactive memory r…
The paper introduces memorywire, a vendor-neutral JSON-Schema wire format and reference implementation designed to standardize and govern memory operations across disparate agent-memory frameworks.
The paper introduces memorywire, a vendor-neutral JSON-Schema 2020-12 wire format and reference implementation to standardize and govern agent memory operations across diverse, proprietary agent-memor…
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more
This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.
The paper proposes a unified framework to evaluate how different types of memory transfer benefit multi-trajectory inference for tool-use LLM agents, finding that the optimal memory method depends cri…
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper demonstrates that standard LLM evaluation metrics overestimate performance because they fail to account for the stability of outcomes, showing a significant gap between reported pass rates a…
Xuanye Zhang, Yongsen Zheng, Zhuqin Xu, Kaiyu Zhou +4 more
MemMorph introduces a novel memory poisoning attack that biases LLM agent tool selection by injecting crafted records into the agent's long-term memory, achieving high success rates even against moder…
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
MemLineage introduces a novel, cryptographically-backed defense mechanism that enforces a chain-of-custody for LLM agent memory, preventing untrusted or poisoned state from justifying sensitive action…
Md Nakhla Rafi, Md Ahasanuzzaman, Dong Jae Kim, Zhijie Wang +1 more
FALAT is a diagnostic framework that treats failure attribution in complex LLM agent trajectories as a dependency-guided search problem, successfully identifying both the responsible agent and the dec…