Papers similar to 2605.05501v1

~ similar to 2605.05501v1· 20 results

cs.CRcs.AIcs.IRRecentApr 30, 2026

Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations

The paper proposes an end-to-end LLM framework that automates SOC operations by integrating ensemble-based threat detection, syntax-constrained query generation, and evidence-grounded incident resolut…

View →

cs.CRcs.AIRecentApr 21, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…

View →

cs.CRRecentMar 18, 2026

The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah, Vishal Srivastava, Dolly Sah, Kayden Jordan

The paper analyzes how runtime safety enforcement impacts the performance of multi-step LLM agents, finding that while safety mechanisms can block unsafe actions, they impose a significant performance…

View →

cs.CRcs.AIRecentApr 11, 2026

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju +2 more

Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…

View →

cs.CRRecentMay 7, 2026

Autonomous Adversary: Red-Teaming in the age of LLM

Mohammad Mamun, Mohamed Gaber, Scott Buffett, Sherif Saad

The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…

View →

cs.CRcs.AIRecentMar 25, 2026

Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more

The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.

View →

cs.CRRecentMay 9, 2026

When LLMs Team Up: A Coordinated Attack Framework for Automated Cyber Intrusions

Minfeng Qi, Tianqing Zhu, Zijie Xu, Congcong Zhu +2 more

The paper introduces CAESAR, a novel multi-agent framework that coordinates LLM agents across five specialized roles to improve success rates and stability in complex, multi-stage cyber intrusion task…

View →

cs.CRcs.CVRecentMar 18, 2026

Toward Reliable, Safe, and Secure LLMs for Scientific Applications

Saket Sanjeev Chaturvedi, Joshua Bergerson, Tanwi Mallick

This paper addresses the critical need for trustworthy LLMs in science by proposing a comprehensive, multi-layered defense framework and methodology to evaluate unique scientific vulnerabilities.

View →

cs.CRcs.AIRecentApr 6, 2026

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more

The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…

View →

cs.NIcs.AIcs.CRRecentMay 12, 2026

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more

The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…

View →

cs.LOcs.AIcs.CRRecentApr 1, 2026

Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

Devakh Rashie, Veda Rashi

The paper introduces the Lean-Agent Protocol, a formal verification platform that uses Lean 4 theorem proving to ensure agentic AI actions in finance are mathematically compliant with complex regulati…

View →

cs.CRcs.AIRecentApr 22, 2026

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler, Ghada Elbez, Veit Hagenmeyer

The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…

View →

cs.CRcs.AIRecentApr 25, 2026

Semantic Denial of Service in LLM-controlled robots

Jonathan Steinberg, Oren Gal

The paper demonstrates a semantic denial-of-service attack against LLM-controlled robots by injecting short, safety-plausible phrases into the audio channel, causing the robot to halt or disrupt execu…

View →

cs.CRcs.AIcs.OSRecentApr 18, 2026

Governed MCP: Kernel-Level Tool Governance for AI Agents via Logit-Based Safety Primitives

Daeyeon Son

The paper introduces Governed MCP, a kernel-resident gateway that enforces comprehensive, robust tool governance for AI agents' privileged tool calls, significantly improving safety beyond userspace m…

View →

cs.CRcs.MARecentApr 15, 2026

SoK: Security of Autonomous LLM Agents in Agentic Commerce

Qian'ang Mao, Jiaxin Wang, Ya Liu, Li Zhu +2 more

The paper develops a unified, cross-layer security framework for autonomous LLM agents operating in agentic commerce, identifying key attack vectors and proposing a layered defense architecture.

View →

cs.CRcs.AIRecentMay 17, 2026

Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications

Isaac David, Arthur Gervais

The paper proposes Ablating Safety, a controlled protocol for removing safety alignment from language models, demonstrating that targeted de-alignment can significantly boost security performance whil…

View →

cs.CRcs.LGRecentMay 23, 2026

Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content

Rohan Pandey, Archit Bhujang

The paper introduces 'log-substrate prompt injection,' demonstrating that attacker-controlled log fields can be used to manipulate LLM-powered security analysis, with persona hijacking and context man…

View →

cs.CLRecentMay 29, 2026

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more

The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…

View →