Testing

Fuzzing, test generation, verification, and software quality

20 papers indexed

cs.CRRecentMay 14, 2026

PickleFuzzer: A Case Study in Fuzzing for Discrepancies Between Python Pickle Implementations

The paper introduces PickleFuzzer, a custom fuzzer that identifies security-critical discrepancies across different Python pickle implementations, finding 14 new bugs including four that could bypass…

View →

cs.SEEmpiricalRecentJul 21, 2026

LLM-Based Invariant Testing for Software Functional Bugs

Ruogu Yang, Yifeng He, Yundi Xu, Yuqing Wei +1 more

LISA is a novel LLM-based invariant testing framework for software functional bugs, achieving higher bug-detection rates and competitive code coverage than fuzzing and prior LLM-based test generation…

View →

cs.SEcs.CRcs.PLRecentApr 29, 2026

Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

Michael Wienczkowski

This paper systematically surveys adaptive and AI-augmented security testing, concluding that a major gap exists—structural-adaptive fragmentation—where current systems fail to integrate structural pr…

View →

cs.CRcs.SEEmpiricalRecentJul 3, 2026

Execution Divergence Graphs:Effective Discovery of Control-Flows from Execution Traces as Fuzzing Feedback

Yu-De Lin, Nils Ole Tippenhauer

This paper proposes approaches for guiding a fuzzer using feedback derived from a control-flow-graph-like structure during the fuzzing of black-box devices and obfuscated compiled binaries.

View →

cs.CRcs.SERecentMay 20, 2026

FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more

FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…

View →

cs.CRcs.AIcs.LGRecentMay 10, 2026

Position: AI Security Policy Should Target Systems, Not Models

Michael A. Riegler, Inga Strümke

The paper demonstrates that advanced capabilities, such as jailbreaking large language models and finding software vulnerabilities, can be achieved effectively at zero cost by coordinating multiple sm…

View →

cs.ARcs.LGEmpiricalRecentJul 7, 2026

HiFuzz: Hierarchical Reinforcement Learning for Semantic-Aware and Adaptive CPU Fuzzing

Ya Wang, Hanwei Fan, Zhenguo Liu, Xiaofeng Zhou +3 more

This paper introduces HiFuzz, a hierarchical reinforcement learning framework for processor verification that replaces mutation with a structured generation process and integrates coverage reward mech…

View →

cs.CRcs.PLRecentApr 20, 2026

SDLLMFuzz: Dynamic-static LLM-assisted greybox fuzzing for structured input programs

Yihao Zou, Tianming Zheng, Futai Zou, Yue Wu

SDLLMFuzz is a novel dynamic-static framework that combines LLM-based structure-aware input generation with semantic feedback from crash analysis to significantly improve vulnerability discovery in st…

View →

cs.CRcs.SERecentMay 20, 2026

Quality-Assured Fuzz Harness Generation via the Four Principles Framework

Ze Sheng, Dmitrijs Trizna, Luigino Camastra, Zhicheng Chen +2 more

The paper introduces QuartetFuzz, an autonomous system that systematically ensures the correctness of fuzzing harnesses using a novel Four Principles framework, significantly improving vulnerability d…

View →

cs.CRcs.AIRecentApr 28, 2026

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Ignacio Peyrano

The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…

View →

cs.AIRecentMay 31, 2026

Before the Model Learns the Bug:Fuzzing RLVR Verifiers

Jaideep Ray

The paper introduces a verifier-fuzzing framework to detect and analyze failure modes in Reinforcement Learning with Verifiable Rewards (RLVR) where bugs in the reward verifier can be exploited by the…

View →

cs.CRcs.SERecentMay 11, 2026

Agentic Fuzzing: Opportunities and Challenges

Junyoung Park, Insu Yun

The paper proposes agentic fuzzing, a novel bug-finding approach where deep agents perform direct reasoning based on historical bugs to discover logic bugs in mature codebases.

View →

cs.CRcs.LGRecentMay 26, 2026

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more

The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.

View →

cs.SEcs.AIcs.CLRecentApr 13, 2026

AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

Zijie Zhao, Chenyuan Yang, Weidong Wang, Yihan Yang +2 more

AnyPoC introduces a general multi-agent framework that reliably generates and validates executable Proof-of-Concept (PoC) tests from candidate bug reports, significantly improving automated bug detect…

View →

cs.CRcs.AIcs.HCEmpiricalRecentJul 26, 2026

The Illusion of Secure LLM Code: Closing the Security Gap via Iterative Reprompting

Ishpuneet Singh, Shreyas Mahajan, Gurjot Singh, Maninder Singh

This paper evaluates the security of authentication code generated by five AI coding assistants using static code analysis and dynamic penetration testing, revealing inconsistent compliance with NIST…

View →

cs.CRcs.SERecentMay 16, 2026

Stop Starving or Stuffing Me: Boosting Firmware Fuzzing Efficiency with On-demand Input Delivery

Shandian Shen, Wei Zhou, Keming Zhao, Peng Liu +2 more

The paper introduces FIDO, a novel framework that significantly boosts firmware fuzzing efficiency by accurately managing the timing and quantity of input delivery based on the firmware's internal inp…

View →

cs.SEcs.PLEmpiricalRecentJul 9, 2026

Toward Inferring Accurate Context-free Grammars for Big Languages in a Black-box Setting

Mohammad Rifat Arefin, Nuhiat Arefin, Shanto Rahman, Christoph Csallner

Xvada introduces new techniques for deterministic context-free grammar inference, improving accuracy and compactness over existing approaches, and discovers vulnerabilities in Python Liquid engine.

View →

cs.CRcs.SEEmpiricalRecentJun 12, 2026

Security in a Workflow: Exploring Role-Based Agentic Architectures for Vulnerability Handling

Srijita Basu, Miroslaw Staron

This paper proposes a role-based agentic workflow for vulnerability analysis and mitigation in software engineering, integrating an analyzer agent with CodeQL and evaluating its performance on 25 real…

View →

cs.SEcs.CRRecentMar 28, 2026

Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang +1 more

MemHint is a neuro-symbolic static analysis pipeline that significantly improves memory leak detection in C/C++ by combining LLM semantic understanding with Z3 symbolic reasoning, detecting more leaks…

View →

cs.GTcs.CRcs.OSRecentApr 9, 2026

VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

Suyash Mishra

The paper introduces VCAO, a novel verifier-centered agentic orchestration framework that models OS vulnerability discovery as a Bayesian Stackelberg game, significantly improving vulnerability discov…

View →