~ similar to 2605.04033v1· 20 results
The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…
The paper systematically explores a vast design space of cryptographic Boolean networks by formalizing six structural constraints, finding that optimal designs result from sparse, mutually compatible…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.
The paper introduces a method to efficiently detect 'essential' constraints in Boolean MinCSPs, significantly reducing the search space for solving these problems and providing a dichotomy theorem for…
The paper introduces PSR extsuperscript{2}, a novel static analysis framework that significantly improves the detection of atomicity violations in smart contracts by combining structural path searchin…
COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.
This paper analyzes large-scale reasoning traces from LLM-based binary vulnerability analysis, identifying four structured, token-level implicit patterns that govern how LLMs explore code paths.
Matthias Cosler, Cas Cremers, Bernd Finkbeiner, Mohamed Ghanem +1 more
The paper introduces a reinforcement learning framework, inspired by AlphaZero, to automate and improve the proof search process within the Tamarin protocol analysis tool, resulting in shorter and mor…
The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…
The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…
This paper investigates various methods for encoding factored tasks, a compact planning representation, into propositional logic for use with SAT solvers, analyzing the impact of encoding choices and…
The paper proposes a method for bit-exact verification of AI inference outputs without sacrificing performance, demonstrating that deterministic, precise re-computation is possible even across differe…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
Xi Yang, Taolue Chen, Yuqi Chen, Fu Song +2 more
This paper introduces a novel algorithm, CiSC, to efficiently and optimally synthesize circuit implementations of linear codes for hardware security, significantly outperforming existing state-of-the-…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…
The paper demonstrates that many instruction-tuned language models suffer from 'silent commitment failure,' meaning they can produce confidently incorrect outputs without any warning signal, and intro…
This review analyzes the dual impact of integrating Large Language Models (LLMs) into hardware design, detailing both their transformative potential in EDA and the critical security vulnerabilities th…
The paper introduces a semantics-first verification framework for an implemented Shor oracle for ECDLP in Qrisp, demonstrating that even seemingly correct implementations can fail due to subtle contro…