Papers similar to 2604.19012v1

~ similar to 2604.19012v1· 20 results

cs.CRcs.AIcs.SERecentMay 5, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…

View →

cs.CRRecentApr 3, 2026

ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts

Minh-Dai Tran-Duong, Nguyen Hai Phong, Nguyen Chi Thanh, Doan Minh Trung +3 more

ContractShield is a robust multimodal framework that uses a novel three-level fusion mechanism to accurately detect multiple types of vulnerabilities in obfuscated smart contracts, significantly outpe…

View →

cs.CRcs.LGRecentMay 26, 2026

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more

The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.

View →

cs.CRcs.LGcs.SERecentApr 23, 2026

Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang

The paper proposes a novel '3+1' heterogeneous multi-agent architecture using cloud LLMs and a local verifier to achieve high-accuracy, cost-effective code vulnerability detection, significantly outpe…

View →

cs.CRcs.SEEmpiricalRecentJul 2, 2026

Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection

Yuqiang Sun, Han Liu, Ying Li, Yiran Zhang +3 more

This paper presents EvoVuln, an automated framework that synthesizes and refines detection logic for smart contract vulnerabilities using minimal labeled samples.

View →

cs.CRRecentApr 22, 2026

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen +3 more

The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vu…

View →

cs.CRcs.AIcs.MARecentApr 20, 2026

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more

The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…

View →

cs.SEcs.AIcs.CRRecentApr 12, 2026

Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

Jugal Gajjar

The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…

View →

cs.CRcs.SERecentMar 31, 2026

When Labels Are Scarce: A Systematic Mapping of Label-Efficient Code Vulnerability Detection

Noor Khalal, Chakib Fettal, Lazhar Labiod, Mohamed Nadif

This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.

View →

cs.CRcs.LGRecentApr 17, 2026

Surgical Repair of Insecure Code Generation in LLMs

Gustavo Sandoval, Brendan Dolan-Gavitt, Siddharth Garg

This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…

View →

cs.CRcs.CLcs.SERecentMay 28, 2026

Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs

Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic

Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…

View →

cs.CRcs.CLcs.CYRecentMay 8, 2026

SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more

SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…

View →

cs.CRcs.LGRecentMay 28, 2026

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann

The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.

View →

cs.CRcs.AIcs.SERecentMar 27, 2026

Knowdit: Agentic Smart Contract Vulnerability Detection with Auditing Knowledge Summarization

Ziqiao Kong, Wanxu Xia, Chong Wang, Yi Lu +4 more

Knowdit is a knowledge-driven, agentic framework that significantly improves smart contract vulnerability detection by modeling shared DeFi semantics and leveraging historical audit knowledge.

View →

cs.SEcs.CRcs.LGRecentMay 13, 2026

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more

The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…

View →

cs.CRcs.SERecentMay 4, 2026

EvoPoC: Automated Exploit Synthesis for DeFi Smart Contracts via Hierarchical Knowledge Graphs

Ruichao Liang, Jing Chen, Xianglong Li, Huangpeng Gu +4 more

EvoPoC introduces a knowledge-driven agentic system that automates the synthesis of verifiable and economically viable exploits for DeFi smart contracts, achieving high recall and significant revenue…

View →

cs.CRRecentApr 27, 2026

GoAT-X: A Graph of Auditing Thoughts for Securing Token Transactions in Cross-Chain Contracts

Zijun Feng, Yuming Feng, Yu Wang, Weizhe Zhang +3 more

GoAT-X introduces a novel framework that structures cross-chain smart contract auditing as a Graph of Auditing Thoughts, significantly improving the detection of complex, semantic vulnerabilities in m…

View →

cs.CRcs.AIRecentApr 2, 2026

From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks

Murtuza Shahzad, Joseph Wilson, Ibrahim Al Azher, Hamed Alhoori +1 more

The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.

View →

cs.CRcs.SERecentMar 22, 2026

Zero-Shot Vulnerability Detection in Low-Resource Smart Contracts Through Solidity-Only Training

Minghao Hu, Qiang Zeng, Lannan Luo

The paper introduces Sol2Vy, a framework that enables cross-language knowledge transfer from Solidity to Vyper, allowing effective vulnerability detection in low-resource smart contracts without needi…

View →

cs.AIcs.CRRecentMay 12, 2026

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more

The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…

View →