ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.00926· 16 results

cs.LGcs.CCRecentJun 1, 2026

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

Qian Li, Xinyu Mao, Shang-Hua Teng

The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…

View →
cs.LGcs.CLRecentMay 29, 2026

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing

Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen

The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…

View →
cs.CLcs.AIcs.LGRecentMay 30, 2026

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

Yuhang Jiang

The paper demonstrates that in Mamba-2, single-bucket probes can detect a large functional signature (detection layer) that is not fully responsible for the actual computation (execution layer), chall…

View →
cs.LGcs.AIRecentJun 1, 2026

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

Yongzhong Xu

The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated t…

View →
cs.LGcs.AIcs.CRRecentMay 3, 2026

Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance

Anamika Paul Rupa, Anietie Andy

The paper introduces Probe-Geometry Alignment (PGA), a surgical method that removes the measurable cross-sequence memorization signature from large language models without degrading their general capa…

View →
cs.AIRecentMay 27, 2026

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…

View →
cs.AIcs.CRcs.CYRecentApr 16, 2026

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Krti Tallam

The paper introduces 'layered mutability,' a framework for analyzing how persistent self-modifying AI agents drift away from intended behavior due to the accumulation of locally reasonable, uncoordina…

View →
cs.AIRecentMay 30, 2026

Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

Hanze Li, Jinhao You, Yichen Guo, Kai Tang +2 more

The paper introduces DeLask, a novel decoding framework that dynamically skips or partially aggregates problematic decoder layers to significantly mitigate hallucinations in Large Language Models.

View →
cs.CLRecentMay 28, 2026

Probing the Prompt KV Cache: Where It Becomes Dispensable

Vinayshekhar Bannihatti Kumar, Manoj Ghuhan Arivazhagan, Disha Makhija, Rashmi Gangadharaiah

This paper investigates the redundancy of the prompt KV cache during language model decoding, finding that the structure provided by chat templates is the primary source of redundancy, not the actual…

View →
cs.AIRecentMay 28, 2026

Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

Yizhuo Lu, Changde Du, Qingyu Shi, Hang Chen +4 more

Mind-Omni introduces a unified multi-task framework that models the interplay between brain, vision, and language signals using a discrete diffusion paradigm, achieving state-of-the-art performance ac…

View →
cs.LGcs.AIcs.CLRecentMay 27, 2026

Parallax: Parameterized Local Linear Attention for Language Modeling

Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more

The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…

View →
cs.CLcs.AIcs.LGRecentMay 31, 2026

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

Partha Pratim Saha, Samarth Raina, Mayur Parvatikar, Amit Dhanda +3 more

The paper introduces MENTIS, a geometry-first framework that measures how preference alignment structurally changes the internal computations of language models, finding that these changes are selecti…

View →
cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →
cs.LGcs.AIEmpiricalComprehensiveRecentJun 4, 2026

Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola

This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.

View →
cs.LGRecentJun 1, 2026

Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

Yung-Chin Chen, Chung Peng Lee, Ze-Wei Liou, Naveen Verma

The paper argues that large activation spikes in LLMs are structural vector biases, and proposes a novel quantization framework (INSERTQUANT) to eliminate these spikes, enabling robust low-bit quantiz…

View →
cs.FLcs.CLcs.LGRecentJun 1, 2026

An Algebraic View of the Expressivity of Recurrent Language Models

Franz Nowak, Ryan Cotterell, Reda Boumasmoud

The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…

View →