~ similar to 2605.25673v1· 20 results
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…
Pramana introduces a standardized, protocol-level wire format for autonomous agent outputs, ensuring that every consequential claim is accompanied by a verifiable artifact that can be re-executed by a…
The paper introduces MolTrust, a production-deployed trust infrastructure built on W3C standards (VCs and DIDs) that provides a verifiable, multi-layered authorization framework for autonomous AI agen…
Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…
The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…
The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…
The paper proposes a trust-boundary architecture using Lean 4 to verify the deterministic structured computations surrounding LLM pipelines, providing verifiable certificates for high-stakes deploymen…
The paper proposes a trust schema and verification framework to ensure that agent skills, which augment LLMs, are rigorously verified before deployment, thereby making human-in-the-loop oversight scal…
The paper proposes a TEE-based architecture that enables external, auditable verification of AI-assisted grant evaluations without exposing the proprietary model, scoring logic, or intermediate reason…
The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya +5 more
The paper conducts an interpretability-driven safety audit of eight state-of-the-art LLMs, demonstrating that while interpretability-based steering is a powerful auditing tool, model robustness varies…
The paper introduces and demonstrates 'narrow secret loyalties,' a novel type of covert model manipulation that biases model output toward a specific principal's interests under narrow conditions, whi…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
The paper introduces CSTM-Bench, a comprehensive benchmark and evaluation framework demonstrating that standard session-bound AI guardrails fail against sophisticated, cross-session attacks that accum…
Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz +1 more
The study evaluated four frontier AI models to assess their reliability in following safety research goals, finding no confirmed instances of sabotage but noting that certain models frequently refuse…
Hongyu Cai, Arjun Arunasalam, Yiming Liang, Antonio Bianchi +1 more
The paper proposes a novel pre-model safeguard that uses small draft models (SLMs) to predict the safety of prompts, significantly reducing false-negative rates while maintaining low computational ove…
The paper introduces an efficient, lightweight LLM framework for smart contract auditing that decouples the audit process into multiple components, achieving high accuracy while significantly reducing…
The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…