~ similar to 2605.30523· 17 results
The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…
The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
The paper provides machine-checked proofs demonstrating that fresh per-stage arithmetic masking ensures pipeline-level security for Number Theoretic Transform (NTT) accelerators used in Post-Quantum C…
CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu +5 more
This paper demonstrates a novel attack against the shuffling defense used in secure Transformer inference, showing that randomly permuted activations can still be exploited to recover model weights.
Ziwen Xu, Haiwen Hong, Linsong Yu, Benglei Cui +3 more
The paper quantifies the exact parametric memory capacity of LLMs using LoRA and proposes a new optimization strategy, MemFT, to enhance memory fidelity.
The paper provides the first machine-checked universal proof, using ring theory, that value-independence implies identical marginal distributions for arithmetic masking, thereby extending the verifica…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
This paper introduces a novel full-space quantization-driven architecture (FQA) to create highly efficient and accurate hardware approximations of nonlinear activation functions using piecewise polyno…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
HARP introduces a novel, adaptive, learnable orthogonal processor that significantly improves the robustness and accuracy of extreme low-bit LLM quantization compared to fixed methods.
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…