ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “LTLsafe”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.CLRecentMay 29, 2026

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more

The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…

View →
cs.CRcs.AIcs.CLRecentApr 8, 2026

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen

The paper introduces TraceSafe-Bench, a comprehensive benchmark, and finds that securing LLM agents requires jointly optimizing for structural reasoning and safety alignment to mitigate risks during m…

View →
cs.AIcs.CRRecentMay 6, 2026

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…

View →
cs.CRcs.PLcs.SERecentApr 28, 2026

Symbolic Execution Meets Multi-LLM Orchestration: Detecting Memory Vulnerabilities in Incomplete Rust CVE Snippets

Zeyad Abdelrazek, Young Lee

The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…

View →
cs.CRcs.AIcs.SERecentMay 12, 2026

Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

Ying Li, Yanju Chen, Peiran Wang, Issac Khabra +3 more

The paper introduces Conleash, a client-side middleware that uses a risk lattice to enforce granular, boundary-scoped authorization for tool invocations, significantly improving user consent and secur…

View →
cs.CRRecentMar 18, 2026

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta, Ken Huang, Kyriakos Rock Lambros, Yasir Mehmood +10 more

The paper introduces LAAF, a novel automated red-teaming framework, to systematically test and exploit Logic-layer Prompt Control Injection (LPCI) vulnerabilities in complex agentic LLM systems.

View →
cs.CRRecentMay 12, 2026

Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis

Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more

The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…

View →
cs.SEcs.CRRecentMay 31, 2026

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more

The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…

View →
cs.SEcs.CRRecentMay 11, 2026

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

Paschal C. Amusuo, Ricardo Calvo, Dharun Anandayuvaraj, Taylor Le Lievre +4 more

AutoSOUP is a system that automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs, leveraging a hybrid LLM-based architecture to overcome manual workflow limitat…

View →
cs.SEcs.CRRecentMar 18, 2026

Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety

Xuan Chen, Lu Yan, Ruqi Zhang, Xiangyu Zhang

The paper introduces SafeAudit, a meta-audit framework that systematically enumerates test cases and uses a quantitative metric to uncover significant residual unsafe behaviors in LLM agents that exis…

View →
cs.CLcs.CRRecentMay 8, 2026

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney, Ash Lewis

GLiGuard introduces a compact, schema-conditioned bidirectional encoder that achieves state-of-the-art performance in LLM content moderation across multiple safety dimensions while drastically reducin…

View →
cs.LGcs.CLRecentMay 28, 2026

CSULoRA: Closest Safe Update Low-Rank Adaptation

Oleksandr Marchenko Breneur, Adelaide Danilov, Aria Nourbakhsh, Salima Lamsiyah

CSULoRA is a post-hoc method that corrects trained LoRA adapters by estimating a safety-aligned subspace and solving a penalized minimum-change problem to attenuate unsafe update directions while pres…

View →
cs.AIcs.CRcs.SERecentMay 24, 2026

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

Xiaoyue Lu, Xianglin Yang, Haijun Liu, Jiahao Liu +3 more

The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exp…

View →
cs.CRcs.AIRecentApr 13, 2026

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun

ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.

View →
cs.CRcs.LORecentApr 14, 2026

COBALT-TLA: A Neuro-Symbolic Verification Loop for Cross-Chain Bridge Vulnerability Discovery

Dominik Blain

COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.

View →
cs.CRcs.AIcs.LGRecentApr 22, 2026

SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

The paper introduces SafeRedirect, a system-level defense that prevents frontier LLMs from generating harmful content during legitimate tasks that structurally require it, significantly reducing unsaf…

View →
cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →
cs.CRRecentApr 8, 2026

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia, Hong Hu

The paper introduces TRUSTDESC, a novel framework that prevents tool poisoning attacks in LLM applications by automatically generating highly accurate and trusted tool descriptions directly from the t…

View →
cs.CRcs.LGRecentApr 22, 2026

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya +5 more

The paper conducts an interpretability-driven safety audit of eight state-of-the-art LLMs, demonstrating that while interpretability-based steering is a powerful auditing tool, model robustness varies…

View →
cs.CRcs.AIcs.CLRecentMar 21, 2026

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Hyomin Lee, Sangwoo Park, Yumin Choi, Sohyun An +2 more

The paper introduces T-MAP, a trajectory-aware evolutionary search method, to discover and generate multi-step adversarial prompts that exploit vulnerabilities in autonomous LLM agents through tool ex…

View →