Papers similar to 2605.16407v1

~ similar to 2605.16407v1· 20 results

cs.CRRecentMar 30, 2026

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos

The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…

View →

cs.CRcs.AIcs.MARecentMay 1, 2026

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere

The paper proposes a trust schema and verification framework to ensure that agent skills, which augment LLMs, are rigorously verified before deployment, thereby making human-in-the-loop oversight scal…

View →

cs.CRcs.AIRecentMay 7, 2026

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

Lars Kersten Kroehl

The paper introduces MolTrust, a production-deployed trust infrastructure built on W3C standards (VCs and DIDs) that provides a verifiable, multi-layered authorization framework for autonomous AI agen…

View →

cs.AIcs.SEEmpiricalRecentJul 16, 2026

Proof-or-Stop: Don't Trust the Agent, Trust the Evidence -- Loop Engineering for Verifiable Evidence-Gated Lifecycle Control

Jek Huang, Jeffery Hsia, Jiayi Sun, Freddie Shi +2 more

This paper introduces Proof-or-Stop Lifecycle Control, a method that allows lifecycle transitions only when mechanically verifiable evidence is provided, and evaluates its implementation.

View →

cs.AIRecentMay 29, 2026

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Tom Lucas, Alessio Buscemi, Alfredo Capozucca, German Castignani +1 more

LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…

View →

cs.CRcs.AIRecentApr 22, 2026

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler, Ghada Elbez, Veit Hagenmeyer

The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…

View →

cs.CRcs.LOcs.MARecentMay 19, 2026

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Ravi Kiran Kadaboina

Pramana introduces a standardized, protocol-level wire format for autonomous agent outputs, ensuring that every consequential claim is accompanied by a verifiable artifact that can be re-executed by a…

View →

cs.CRRecentMay 20, 2026

An Evidence-driven Protocol for Trustworthy CI Pipelines

Fernando Castillo, Eduardo Brito, Pille Pullonen-Raudvere, Sebastian Werner +1 more

The paper proposes an evidence-driven protocol combining Deterministic Build Systems and Trusted Execution Environments to provide cryptographically verifiable guarantees of software artifact integrit…

View →

cs.CRcs.AIcs.CYRecentApr 28, 2026

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

Kemal Bicakci

The paper proposes a TEE-based architecture that enables external, auditable verification of AI-assisted grant evaluations without exposing the proprietary model, scoring logic, or intermediate reason…

View →

cs.CRRecentMay 6, 2026

Sealing the Audit-Runtime Gap for LLM Skills

Tingda Shen, Yebo Feng, Konglin Zhu, Xiaojun Jia +2 more

The paper introduces SIGIL, a novel framework that cryptographically seals the entire lifecycle of LLM skills, ensuring verifiable integrity from publication through runtime execution to prevent suppl…

View →

cs.LOcs.CEcs.ETRecentJun 1, 2026

Federated Formal Verification: Cross-Backend Citation, Cross-Axis Convergence, and AI-Orchestrated Proof Dispatch for Production Systems

Pierre Falda

The paper proposes a federated formal verification architecture that treats verification as a polyglot proof system, successfully validating it on complex production subsystems like a Raft consensus m…

View →

cs.CRRecentMay 28, 2026

Bridging Theory and Practice: An Executable Taxonomy of Security Properties for ProVerif and Tamarin

Leonard Tudorache, Ivan Kurtev, Mark van den Brand

The paper introduces a systematic, executable taxonomy of security properties to bridge the gap between theoretical security definitions and their practical implementation in formal verification tools…

View →

cs.CRcs.AIRecentMar 31, 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu +5 more

This paper provides the first comprehensive Systematization of Knowledge (SoK) on the security aspects of LLM-as-a-Judge (LaaJ) systems, identifying key vulnerabilities and proposing a taxonomy for fu…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentMay 14, 2026

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

Ciyan Ouyang, Rui Hou

MemLineage introduces a novel, cryptographically-backed defense mechanism that enforces a chain-of-custody for LLM agent memory, preventing untrusted or poisoned state from justifying sensitive action…

View →

cs.CRRecentApr 18, 2026

From Public-Key Linting to Operational Post-Quantum X.509 Assurance for ML-KEM and ML-DSA: Registry-Driven Policy, Mutation-Based Evaluation, and Import Validation

José Luis Delgado Jiménez

The paper introduces an operational post-quantum X.509 assurance framework that rigorously validates ML-KEM and ML-DSA certificates and keys across various deployment stages, achieving comprehensive d…

View →

cs.CRcs.AIcs.ETRecentApr 27, 2026

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

Antony Rowstron

The paper proposes Agentic Witnessing, a TEE-enabled framework that allows external verifiers to audit the qualitative properties of private datasets by querying an LLM-based auditor without accessing…

View →

cs.CRcs.AIRecentApr 28, 2026

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Ignacio Peyrano

The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…

View →

cs.LOcs.AIcs.CRRecentApr 1, 2026

Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

Devakh Rashie, Veda Rashi

The paper introduces the Lean-Agent Protocol, a formal verification platform that uses Lean 4 theorem proving to ensure agentic AI actions in finance are mathematically compliant with complex regulati…

View →

cs.CRcs.AIcs.CLRecentJun 2, 2026

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

The paper introduces an efficient, lightweight LLM framework for smart contract auditing that decouples the audit process into multiple components, achieving high accuracy while significantly reducing…

View →