Papers similar to 2605.04033v1

~ similar to 2605.04033v1· 20 results

cs.AIcs.CLcs.LORecentMay 27, 2026

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…

View →

cs.CRRecentApr 30, 2026

SBN Explorer: An Empirical Study of Cryptographic Boolean Networks

Arnaud Valence

The paper systematically explores a vast design space of cryptographic Boolean networks by formalizing six structural constraints, finding that optimal designs result from sparse, mutually compatible…

View →

cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →

cs.AIcs.CRRecentMar 26, 2026

On the Foundations of Trustworthy Artificial Intelligence

TJ Dunham

The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.

View →

cs.CCcs.DSRecentMay 30, 2026

Search-space Reduction for Boolean MinCSPs via Essential Constraints

Bart M. P. Jansen, Ruben F. A. Verhaegh

The paper introduces a method to efficiently detect 'essential' constraints in Boolean MinCSPs, significantly reducing the search space for solving these problems and providing a dichotomy theorem for…

View →

cs.CRRecentApr 8, 2026

PSR2: A Phase-based Semantic Reasoning Framework for Atomicity Violation Detection via Contract Refinement

Xiaoqi Li, Xin Wang, Wenkai Li, Zongwei Li

The paper introduces PSR extsuperscript{2}, a novel static analysis framework that significantly improves the detection of atomicity violations in smart contracts by combining structural path searchin…

View →

cs.CRcs.LORecentApr 14, 2026

COBALT-TLA: A Neuro-Symbolic Verification Loop for Cross-Chain Bridge Vulnerability Discovery

Dominik Blain

COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.

View →

cs.AIcs.CRcs.SERecentMar 19, 2026

Implicit Patterns in LLM-Based Binary Analysis

Qiang Li, XiangRui Zhang, Haining Wang

This paper analyzes large-scale reasoning traces from LLM-based binary vulnerability analysis, identifying four structured, token-level implicit patterns that govern how LLMs explore code paths.

View →

cs.CRcs.LGRecentMay 22, 2026

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

Matthias Cosler, Cas Cremers, Bernd Finkbeiner, Mohamed Ghanem +1 more

The paper introduces a reinforcement learning framework, inspired by AlphaZero, to automate and improve the proof search process within the Tamarin protocol analysis tool, resulting in shorter and mor…

View →

cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →

cs.AIcs.LORecentMay 28, 2026

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Pedro Orvalho, Marta Kwiatkowska, Guillem Alenyà, Felip Manyà

The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…

View →

cs.AIRecentMay 28, 2026

Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)

João Filipe, Álvaro Torralba, Gregor Behnke

This paper investigates various methods for encoding factored tasks, a compact planning representation, into propositional logic for use with SAT solvers, analyzing the impact of encoding choices and…

View →

cs.CRcs.LGRecentMay 29, 2026

Bit-Exact AI Inference Verification Without Performance Tradeoffs

Naci Cankaya

The paper proposes a method for bit-exact verification of AI inference outputs without sacrificing performance, demonstrating that deterministic, precise re-computation is possible even across differe…

View →

cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →

cs.CRcs.LOcs.SERecentApr 4, 2026

Optimal Circuit Synthesis of Linear Codes for Error Detection and Correction

Xi Yang, Taolue Chen, Yuqi Chen, Fu Song +2 more

This paper introduces a novel algorithm, CiSC, to efficiently and optimally synthesize circuit implementations of linear codes for hardware security, significantly outperforming existing state-of-the-…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →

cs.AIRecentMay 27, 2026

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

Lu Yan, Xuan Chen, Xiangyu Zhang

The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…

View →

cs.AIcs.CRcs.LGRecentMar 22, 2026

Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures

Gregory M. Ruddell

The paper demonstrates that many instruction-tuned language models suffer from 'silent commitment failure,' meaning they can produce confidently incorrect outputs without any warning signal, and intro…

View →

cs.CRcs.ARcs.LGRecentMay 11, 2026

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

This review analyzes the dual impact of integrating Large Language Models (LLMs) into hardware design, detailing both their transformative potential in EDA and the critical security vulnerabilities th…

View →

cs.SEcs.CRquant-phRecentMay 1, 2026

Semantics-Based Verification of an Implemented Shor Oracle for ECDLP in Qrisp

Lei Zhang, Zhiyuan Chen

The paper introduces a semantics-first verification framework for an implemented Shor oracle for ECDLP in Qrisp, demonstrating that even seemingly correct implementations can fail due to subtle contro…

View →