20 results for “software engineering”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper introduces ASE-26, a comprehensive undergraduate curriculum designed to formalize and teach agentic software engineering as a distinct academic discipline.
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
This paper proposes a role-based agentic workflow for vulnerability analysis and mitigation in software engineering, integrating an analyzer agent with CodeQL and evaluating its performance on 25 real…
This paper studies AI development frameworks for software engineering and proposes a six-dimension process taxonomy.
SEMBridge is a tagless-final framework that allows a single executable object program to generate multiple program semantics, including weakest-precondition and bounded-checking interpretations, ensur…
The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.
The paper argues that current Software Bills of Materials (SBOMs) are fundamentally flawed due to a lack of shared understanding regarding what constitutes a 'component,' demonstrating that existing t…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more
This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…
Han Dai, Soumyakant Priyadarshan, Abdullah Imran, Ruoyu Wang +1 more
SCRIBE is a novel framework that enables reliable source-level patching of binaries by performing 'binary-aware' recompilation, successfully resolving syntactic and semantic inaccuracies inherent in d…
Larissa Schmid, Diogo Gaspar, Raphina Liu, Sofia Bobadilla +2 more
The paper introduces 'software supply chain smells,' structural indicators of security risks in third-party dependencies, and presents Dirty-Waters, a tool that detects these smells, finding that diff…
This study provides an ecosystem-scale measurement of commit signing on GitHub, finding that current signing adoption rates are misleading and that developers struggle to maintain consistent, long-ter…
Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more
The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.
Hulin Wang, Zion Leonahenahe Basque, Jie Hu, Ati Priya Bajaj +12 more
The paper introduces Kumushi, a root-cause-driven patching agent that significantly improves automated vulnerability repair by focusing LLMs on the true source of bugs, outperforming existing methods…
Code2LoRA introduces a hypernetwork framework to efficiently inject repository-specific knowledge into code language models using LoRA adapters, supporting both static and evolving codebases.
The paper introduces FVSpec, a large-scale benchmark that translates thousands of real-world Python property-based tests into formal Lean 4 specifications to evaluate AI models for formal software ver…
Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…
The paper proposes a new binary format that embeds compiler-generated metadata into executables, making the binary structure more transparent and enabling reliable analysis, instrumentation, and recom…
Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more
The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…
This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…