~ similar to 2606.02167· 20 results
The paper proposes a hybrid LLM-based assistance system that enhances traditional capability-based planning by providing natural language interaction, interpretability, and flexible knowledge model ad…
The paper introduces new benchmarks for complex asynchronous planning and demonstrates that general constraint satisfaction formalizers (like CP-SAT) significantly outperform direct LLM planning or tr…
pcbGPT is a grounded system that automatically generates editable KiCad PCB schematics from natural language requirements, achieving high accuracy on complex embedded design tasks.
The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond…
The paper proposes a management framework, using a governed AI query-broker artifact, to safely integrate generative AI into high-risk operational decision support, such as Security Operations Centers…
Tool Forge is a validation-carrying toolchain that converts natural language capability intent into governed, sandbox-verified tool artifacts, significantly improving agent efficiency and reliability.
Guangsheng Yu, Qin Wang, Rui Lang, Shuai Su +1 more
PlanTwin introduces a privacy-preserving architecture that allows cloud-hosted LLMs to plan over sensitive local environments by projecting the raw state into a sanitized, abstract digital twin.
This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.
The paper introduces CritBench, a novel framework to evaluate LLM cybersecurity capabilities specifically within IEC 61850 Digital Substation Operational Technology (OT) environments, finding that whi…
The paper introduces ASTRAL, a multimodal LLM-driven framework that reconstructs and analyzes fragmented cyber-physical system architectures to enable comprehensive and quantitative security risk asse…
Shams Tarek, Dipayan Saha, Khan Thamid Hasan, Sujan Kumar Saha +2 more
Assertain is an automated framework that uses large language models and design analysis to generate high-quality, executable security assertions for hardware designs, significantly outperforming state…
The paper introduces Policy-First Tooling, a model-agnostic permission layer that significantly enhances the safety and reliability of tool-orchestrated AI workflows by enforcing explicit constraints…
The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.
Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more
The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more
The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…
Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…
DeltaMCP is a specification-aware, incremental regeneration tool that efficiently updates Model Context Protocol (MCP) servers by only modifying affected tooling when a service's OpenAPI specification…
This paper analyzes the performance of agentic LLM systems in complex binary reverse engineering, identifying key limitations such as handling obfuscation and token constraints, and proposing future d…
The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…