~ similar to 2605.30995· 20 results
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…
The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…
The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…
Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu +4 more
This paper introduces 'implicit identity' as a unifying framework to survey and categorize LLM fingerprinting and watermarking techniques for verifying ownership and provenance across datasets, models…
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
The paper proposes GovAI-Pipe, a novel four-layer governance pipeline that operationalizes high-level AI policies (like the EU AI Act) into auditable, technical checkpoints for deploying AI in large-s…
The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…
The paper proposes a TEE-based architecture that enables external, auditable verification of AI-assisted grant evaluations without exposing the proprietary model, scoring logic, or intermediate reason…
The paper proposes a trust-boundary architecture using Lean 4 to verify the deterministic structured computations surrounding LLM pipelines, providing verifiable certificates for high-stakes deploymen…
This study finds that when users do not specify a jurisdiction, the language used in the prompt strongly biases the LLM's response toward a specific national legal framework (U.S. for English, China f…
The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…
The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…
AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…
The paper introduces a novel, scalable framework to monitor and classify dataset usage within research literature, addressing the current lack of infrastructure for tracking data citations.
Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more
The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…
The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…
The paper proposes an Interpretive Audit Pipeline to evaluate LLMs for public comment analysis, arguing that measuring inter-model disagreement is crucial because standard accuracy metrics fail to det…
This paper provides a systematic regulatory mapping and compliance architecture for AI agents operating under the complex web of EU laws, concluding that high-risk agents with untraceable behavioral d…