ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “software engineering”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CYcs.AIcs.SERecentMay 31, 2026

ASE-26: a curriculum for agentic software engineering as a discipline

Mikael Gorsky

This paper introduces ASE-26, a comprehensive undergraduate curriculum designed to formalize and teach agentic software engineering as a distinct academic discipline.

View →
cs.SEcs.AIRecentMay 28, 2026

Projectional Decoding: Towards Semantic-Aware LLM Generation

Boqi Chen, José Antonio Hernández López, Aren A. Babikian

The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.

View →
cs.CRcs.SEEmpiricalRecentJun 12, 2026

Security in a Workflow: Exploring Role-Based Agentic Architectures for Vulnerability Handling

Srijita Basu, Miroslaw Staron

This paper proposes a role-based agentic workflow for vulnerability analysis and mitigation in software engineering, integrating an analyzer agent with CodeQL and evaluating its performance on 25 real…

View →
cs.SEcs.AIRecentJun 3, 2026

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

Sanderson Oliveira de Macedo

This paper studies AI development frameworks for software engineering and proposes a six-dimension process taxonomy.

View →
cs.PLcs.AIRecentMay 29, 2026

SEMBridge: Tagless-Final Program Semantics with Weakest-Precondition and Bounded-Checking Interpretations

Eric Liang

SEMBridge is a tagless-final framework that allows a single executable object program to generate multiple program semantics, including weakest-precondition and bounded-checking interpretations, ensur…

View →
cs.CRcs.AIRecentMay 11, 2026

Engineering Robustness into Personal Agents with the AI Workflow Store

Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari +2 more

The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.

View →
cs.SEcs.CRRecentJun 1, 2026

Poking Around in the Dark: Why a Shared Understanding of Components Matters

Felix Reichmann, Wolfgang Krane, Alena Naiakshina, Martin Johns +1 more

The paper argues that current Software Bills of Materials (SBOMs) are fundamentally flawed due to a lack of shared understanding regarding what constitutes a 'component,' demonstrating that existing t…

View →
cs.SEcs.AIRecentMay 28, 2026

Inferring Code Correctness from Specification

Tambon Florian, Papadakis Mike

The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…

View →
cs.SEcs.AIcs.HCRecentMay 28, 2026

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more

This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…

View →
cs.CRcs.SERecentMay 4, 2026

SCRIBE: Practical Static Binary Patching via Binary-Aware Recompilation of Decompiled Code

Han Dai, Soumyakant Priyadarshan, Abdullah Imran, Ruoyu Wang +1 more

SCRIBE is a novel framework that enables reliable source-level patching of binaries by performing 'binary-aware' recompilation, successfully resolving syntactic and semantic inaccuracies inherent in d…

View →
cs.SEcs.CRRecentMar 25, 2026

Software Supply Chain Smells: Lightweight Analysis for Secure Dependency Management

Larissa Schmid, Diogo Gaspar, Raphina Liu, Sofia Bobadilla +2 more

The paper introduces 'software supply chain smells,' structural indicators of security risks in third-party dependencies, and presents Dirty-Waters, a tool that detects these smells, finding that diff…

View →
cs.SEcs.CRRecentApr 15, 2026

Analysis of Commit Signing on Github

Abubakar Sadiq Shittu, John Sadik, Farzin Gholamrezae, Scott Ruoti

This study provides an ecosystem-scale measurement of commit signing on GitHub, finding that current signing adoption rates are misleading and that developers struggle to maintain consistent, long-ter…

View →
cs.SEcs.AIRecentMay 28, 2026

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more

The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.

View →
cs.CRcs.SERecentMay 5, 2026

Root-Cause-Driven Automated Vulnerability Repair

Hulin Wang, Zion Leonahenahe Basque, Jie Hu, Ati Priya Bajaj +12 more

The paper introduces Kumushi, a root-cause-driven patching agent that significantly improves automated vulnerability repair by focusing LLMs on the true source of bugs, outperforming existing methods…

View →
cs.SEcs.AIcs.CLRecentJun 4, 2026

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie

Code2LoRA introduces a hypernetwork framework to efficiently inject repository-specific knowledge into code language models using LoRA adapters, supporting both static and evolving codebases.

View →
cs.SEcs.AIRecentMay 31, 2026

FVSpec: Real-World Property-Based Tests as Lean Challenges

Quinn Dougherty, Max von Hippel, Hazel Shackleton, Mike Dodds

The paper introduces FVSpec, a large-scale benchmark that translates thousands of real-world Python property-based tests into formal Lean 4 specifications to evaluate AI models for formal software ver…

View →
cs.SEcs.CRRecentMay 31, 2026

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more

The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…

View →
cs.CRcs.PLRecentApr 21, 2026

Adding Compilation Metadata To Binaries To Make Disassembly Decidable

Daniel Engel, Freek Verbeek, Pranav Kumar, Binoy Ravindran

The paper proposes a new binary format that embeds compiler-generated metadata into executables, making the binary structure more transparent and enabling reliable analysis, instrumentation, and recom…

View →
cs.SEcs.CRcs.LGRecentMay 13, 2026

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more

The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…

View →
cs.CRRecentMay 8, 2026

Longitudinal Analyses of SAST Tools: A CodeQL Case Study

Jean-Charles Noirot Ferrand, Kyle Domico, Yohan Beugin, Patrick McDaniel

This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…

View →