Papers similar to 2604.25200v1

~ similar to 2604.25200v1· 20 results

cs.AIRecentMay 29, 2026

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Tom Lucas, Alessio Buscemi, Alfredo Capozucca, German Castignani +1 more

LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…

View →

cs.LOcs.CLcs.CRRecentMay 13, 2026

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

George Koomullil

The paper proposes a trust-boundary architecture using Lean 4 to verify the deterministic structured computations surrounding LLM pipelines, providing verifiable certificates for high-stakes deploymen…

View →

cs.CRcs.AIcs.CLRecentJun 2, 2026

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

The paper introduces an efficient, lightweight LLM framework for smart contract auditing that decouples the audit process into multiple components, achieving high accuracy while significantly reducing…

View →

cs.AIRecentMay 29, 2026

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

Swastik Roy, Rajkumar Pujari, Tharindu Kumarage, Charith Peris +4 more

PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of re…

View →

cs.CRcs.AIRecentMay 25, 2026

Referential Security as a New Paradigm for AI Evaluations

Dan Ristea, Vasilios Mavroudis

The paper proposes referential security as a new paradigm for AI evaluation to ensure that safety claims and audits remain tied to specific, verifiable system instances despite continuous, unannounced…

View →

cs.CRcs.LOcs.MARecentMay 19, 2026

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Ravi Kiran Kadaboina

Pramana introduces a standardized, protocol-level wire format for autonomous agent outputs, ensuring that every consequential claim is accompanied by a verifiable artifact that can be re-executed by a…

View →

cs.CRcs.AIRecentMay 7, 2026

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

Lars Kersten Kroehl

The paper introduces MolTrust, a production-deployed trust infrastructure built on W3C standards (VCs and DIDs) that provides a verifiable, multi-layered authorization framework for autonomous AI agen…

View →

cs.CRRecentMar 30, 2026

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Zhuoran Tan, Jeremy Singer, Christos Anagnostopoulos

The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…

View →

cs.CRRecentMar 24, 2026

Leveraging Large Language Models for Trustworthiness Assessment of Web Applications

Oleksandr Yarotskyi, José D'Abruzzo Pereira, João R. Campos

This paper proposes an empirical methodology to automate web application trustworthiness assessment by leveraging Large Language Models (LLMs) to verify adherence to secure coding practices, showing t…

View →

cs.CRcs.AIcs.ETRecentApr 27, 2026

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

Antony Rowstron

The paper proposes Agentic Witnessing, a TEE-enabled framework that allows external verifiers to audit the qualitative properties of private datasets by querying an LLM-based auditor without accessing…

View →

cs.CRcs.AIRecentMar 31, 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu +5 more

This paper provides the first comprehensive Systematization of Knowledge (SoK) on the security aspects of LLM-as-a-Judge (LaaJ) systems, identifying key vulnerabilities and proposing a taxonomy for fu…

View →

cs.CLRecentMay 28, 2026

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

David Gros, Adam Gleave

The paper tested the hypothesis that wrapping untrusted prompt inputs in mock tool calls would improve LLM robustness, but found that this technique generally fails and can even increase vulnerability…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRRecentMar 25, 2026

Trusted-Execution Environment (TEE) for Solving the Replication Crisis in Academia

Jiasun Li, Project Team

The paper proposes using Trusted-Execution Environments (TEEs) to create a scalable, privacy-preserving system where authors can submit cryptographic proofs of correct research replication, thereby ad…

View →

cs.AIcs.CRRecentMay 11, 2026

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Pedro Conde, Henrique Branquinho, Valerio Mazzone, Bruno Mendes +2 more

The paper introduces a novel, practical evaluation protocol that shifts the assessment of AI pentesting agents from simple task completion to validated, open-ended vulnerability discovery in complex,…

View →

cs.CRcs.AIRecentJun 2, 2026

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He +5 more

This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can le…

View →

cs.CRRecentMay 6, 2026

Sealing the Audit-Runtime Gap for LLM Skills

Tingda Shen, Yebo Feng, Konglin Zhu, Xiaojun Jia +2 more

The paper introduces SIGIL, a novel framework that cryptographically seals the entire lifecycle of LLM skills, ensuring verifiable integrity from publication through runtime execution to prevent suppl…

View →

cs.AIRecentMay 27, 2026

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam +2 more

The paper critiques current AI benchmarking practices for low-resource settings, arguing that evaluation must shift focus from isolated model performance to the holistic performance of the deployed sy…

View →

cs.AIcs.CRcs.IRRecentApr 3, 2026

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li

AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.

View →

cs.CRcs.AIcs.PLRecentMay 1, 2026

Certified Purity for Cognitive Workflow Executors: From Static Analysis to Cryptographic Attestation

Alan L. McCann

The paper introduces a certified purity architecture that strengthens governance in cognitive workflow systems by replacing insufficient runtime checks with cryptographically attested structural guara…

View →

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

Referential Security as a New Paradigm for AI Evaluations

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Leveraging Large Language Models for Trustworthiness Assessment of Web Applications

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

Security in LLM-as-a-Judge: A Comprehensive SoK

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Trusted-Execution Environment (TEE) for Solving the Replication Crisis in Academia

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Sealing the Audit-Runtime Gap for LLM Skills

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Certified Purity for Cognitive Workflow Executors: From Static Analysis to Cryptographic Attestation

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

Referential Security as a New Paradigm for AI Evaluations

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

From Specification to Deployment: Empirical Evidence from a W3C VC + DID Trust Infrastructure for Autonomous Agents

Attesting LLM Pipelines: Enforcing Verifiable Training and Release Claims

Leveraging Large Language Models for Trustworthiness Assessment of Web Applications

Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing

Security in LLM-as-a-Judge: A Comprehensive SoK

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Trusted-Execution Environment (TEE) for Solving the Replication Crisis in Academia

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Sealing the Audit-Runtime Gap for LLM Skills

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Certified Purity for Cognitive Workflow Executors: From Static Analysis to Cryptographic Attestation

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems