Papers similar to 2606.02167

~ similar to 2606.02167· 20 results

cs.AIRecentMay 27, 2026

An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

Luis Miguel Vieira da Silva, Nicolas König, Felix Gehlhoff

The paper proposes a hybrid LLM-based assistance system that enhances traditional capability-based planning by providing natural language interaction, interpretability, and flexible knowledge model ad…

View →

cs.CLRecentMay 31, 2026

Robust Asynchronous Planning via Auto-Formalization

Jiayi Zhang, Jianing Yin, Ben Zhou, Li Zhang

The paper introduces new benchmarks for complex asynchronous planning and demonstrates that general constraint satisfaction formalizers (like CP-SAT) significantly outperform direct LLM planning or tr…

View →

cs.HCcs.AIRecentMay 31, 2026

pcbGPT: Automatic PCB Schematic Synthesis from Natural Language Requirements

Tobias King, Steven Kehrberg, Michael Beigl, Tobias Röddiger

pcbGPT is a grounded system that automatically generates editable KiCad PCB schematics from natural language requirements, achieving high accuracy on complex embedded design tasks.

View →

cs.AIRecentMay 27, 2026

MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

Xiaoyu Dong, Zhi Li, Xiao-Ming Wu

The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond…

View →

cs.CRcs.AIRecentMay 10, 2026

Governing AI-Assisted Security Operations: A Design Science Framework for Operational Decision Support

Elyson A. De La Cruz, Rishikesh Sahay, Md Rasel Al Mamun

The paper proposes a management framework, using a governed AI query-broker artifact, to safely integrate generative AI into high-risk operational decision support, such as Security Operations Centers…

View →

cs.SEcs.AIRecentMay 27, 2026

Tool Forge: A Validation-Carrying Toolchain for Governed Agentic Execution

Swanand Rao

Tool Forge is a validation-carrying toolchain that converts natural language capability intent into governed, sandbox-verified tool artifacts, significantly improving agent efficiency and reliability.

View →

cs.CRcs.AIcs.ETRecentMar 19, 2026

PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents

Guangsheng Yu, Qin Wang, Rui Lang, Shuai Su +1 more

PlanTwin introduces a privacy-preserving architecture that allows cloud-hosted LLMs to plan over sensitive local environments by projecting the raw state into a sanitized, abstract digital twin.

View →

cs.AIRecentMay 28, 2026

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

Elliot Gestrin, Jendrik Seipp

This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.

View →

cs.CRcs.AIRecentApr 7, 2026

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

Gustav Keppler, Moritz Gstür, Veit Hagenmeyer

The paper introduces CritBench, a novel framework to evaluate LLM cybersecurity capabilities specifically within IEC 61850 Digital Substation Operational Technology (OT) environments, finding that whi…

View →

cs.CRcs.AIRecentApr 7, 2026

From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems

Shaofei Huang, Christopher M. Poskitt, Lwin Khin Shar

The paper introduces ASTRAL, a multimodal LLM-driven framework that reconstructs and analyzes fragmented cyber-physical system architectures to enable comprehensive and quantitative security risk asse…

View →

cs.CRRecentApr 2, 2026

Assertain: Automated Security Assertion Generation Using Large Language Models

Shams Tarek, Dipayan Saha, Khan Thamid Hasan, Sujan Kumar Saha +2 more

Assertain is an automated framework that uses large language models and design analysis to generate high-quality, executable security assertions for hardware designs, significantly outperforming state…

View →

cs.CRcs.SERecentMar 18, 2026

Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows

Akshey Sigdel, Rista Baral

The paper introduces Policy-First Tooling, a model-agnostic permission layer that significantly enhances the safety and reliability of tool-orchestrated AI workflows by enforcing explicit constraints…

View →

cs.CRcs.AIRecentMay 11, 2026

Engineering Robustness into Personal Agents with the AI Workflow Store

Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari +2 more

The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.

View →

cs.NIcs.AIcs.CRRecentMay 12, 2026

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more

The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.CLRecentJun 1, 2026

Scaling Agentic Capabilities via Grounded Interaction Synthesis

Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more

The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…

View →

cs.OScs.AIcs.CRRecentJun 2, 2026

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Yingqi Zhang

Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…

View →

cs.SEcs.AIRecentMay 27, 2026

DeltaMCP: Incremental Regeneration via Spec-Aware Transformation for MCP servers

Aditya Pujara, Xiaogang Zhu, Hsiang-Ting Chen

DeltaMCP is a specification-aware, incremental regeneration tool that efficiently updates Model Context Protocol (MCP) servers by only modifying affected tooling when a service's OpenAPI specification…

View →

cs.CRcs.AIRecentApr 15, 2026

Challenges and Future Directions in Agentic Reverse Engineering Systems

Salem Radey, Jack West, Kassem Fawaz

This paper analyzes the performance of agentic LLM systems in complex binary reverse engineering, identifying key limitations such as handling obfuscation and token constraints, and proposing future d…

View →

cs.CRcs.AIRecentApr 28, 2026

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Ignacio Peyrano

The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…

View →