Papers similar to 2605.27858

~ similar to 2605.27858· 19 results

cs.CRcs.LOcs.MARecentMay 19, 2026

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Pramana introduces a standardized, protocol-level wire format for autonomous agent outputs, ensuring that every consequential claim is accompanied by a verifiable artifact that can be re-executed by a…

View →

cs.AIcs.CRcs.IRRecentApr 3, 2026

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li

AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.

View →

cs.SEcs.AIcs.CRRecentMay 12, 2026

Decaf: Improving Neural Decompilation with Automatic Feedback and Search

Alexander Shypula, Osbert Bastani, Edward Schwartz

The paper introduces Decaf, a system that uses automatic feedback and search to significantly improve the semantic correctness and accuracy of neural decompilers, boosting the decompilation rate from…

View →

cs.CRcs.AIcs.CLRecentMay 25, 2026

TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification

Yutong Cheng, Changze Li, Raihan Sultan Pasha Basuki, Qian Cui +2 more

TTPrint proposes a novel diverge-then-converge framework for extracting MITRE ATT&CK techniques from CTI reports, significantly improving both recall and precision compared to existing methods.

View →

cs.CLcs.IRRecentJun 3, 2026

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

Zhenyu Yu, Shuigeng Zhou

This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.

View →

cs.CRcs.DBRecentMay 3, 2026

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

Huining Cui, Wei Liu

The paper introduces RAGCharacter, a forensic framework that enables black-box, character-level traceback to pinpoint the exact poisoned span in retrieved evidence responsible for a misgeneration even…

View →

cs.CLcs.AIRecentMay 31, 2026

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more

The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…

View →

cs.CRcs.LGRecentApr 7, 2026

AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng +2 more

AttnDiff introduces a data-efficient white-box framework that extracts intrinsic attention-based fingerprints to verify the provenance and detect unauthorized derivation of large language models (LLMs…

View →

cs.GTcs.CRcs.LGRecentMay 8, 2026

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

The paper introduces the quotient semivalue mechanism to provide fair data attribution that is resistant to contributors manipulating their reported identities by splitting or duplicating data.

View →

cs.LGcs.AIcs.CRRecentApr 17, 2026

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar +3 more

The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.AIRecentMay 28, 2026

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Yundong Kim, Heyoung Yang

The paper introduces TRACE, a novel metric that evaluates the logical structure of LLM reasoning (CoT) by integrating Toulmin's argumentation theory, demonstrating that sound reasoning structure corre…

View →

cs.CRcs.AIRecentMay 28, 2026

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Almene De Meran Meguimtsop, Maria Leonor Pacheco, Daniel E. Acuna

The paper introduces SciIntBench, an adversarial benchmark that reveals that LLMs' adherence to research integrity norms is highly sensitive to how the misconduct is framed, often failing when the mis…

View →

cs.CRcs.AIRecentMay 28, 2026

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Almene De Meran Meguimtsop, Maria Leonor Pacheco, Daniel E. Acuna

The paper introduces SciIntBench, an adversarial benchmark that reveals that LLMs' adherence to research integrity norms is highly sensitive to how the misconduct is framed, failing particularly when…

View →

cs.LGcs.AIcs.CLRecentMay 28, 2026

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content

Ihor Stepanov, Aleksandr Smechov

The paper introduces Opir, an efficient family of encoder-based multi-task guardrail models that provides competitive safety classification performance across various tasks while maintaining a signifi…

View →

cs.SEcs.AIcs.CRRecentApr 14, 2026

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Qiang Zhang, Zhongnian Li

The paper proposes CoDe-R, a two-stage framework that significantly improves the accuracy and re-executability of decompiled code generated by LLMs, achieving a new SOTA in the lightweight regime.

View →

cs.CRcs.AIRecentMay 1, 2026

E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems

Zelin Guan, Shengda Zhuo, Zeyan Li, Jinchun He +3 more

E-MIA introduces a novel, stealthy black-box membership inference attack that converts verifiable hard evidence within a candidate document into an objective, multi-part exam score to determine if the…

View →

cs.AIRecentMay 29, 2026

HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs

Yisen Gao, Yixi Cai, Tianshi Zheng, Jiaxin Bai +1 more

HypoAgent is an agentic framework that enables interactive, multi-turn abductive hypothesis generation over knowledge graphs, achieving state-of-the-art performance by integrating specialized agents f…

View →

cs.LGcs.AIRecentMay 29, 2026

EchoRL: Reinforcement Learning via Rollout Echoing

Jinhe Bi, Aniri, Minglai Yang, Xingcheng Zhou +8 more

EchoRL proposes a lightweight module to exploit valuable learning signals from advantage-degenerated rollouts in Reinforcement Learning with Verifiable Rewards (RLVR), significantly improving LLM post…

View →