~ similar to 2605.31558· 19 results
This paper localizes the attention heads within LLMs responsible for specific reasoning steps, finding that specialized heads handle factual retrieval while higher layers manage global information int…
The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…
The paper demonstrates that Transformers trained on local comparisons implicitly learn a global, one-dimensional ordinal structure, mirroring the human ability to perform transitive inference.
The paper tracks the developmental emergence of attention circuits in 1B-class language models, finding that the formation of induction and attention-sink circuits are distinct, temporally separated t…
This paper demonstrates that large language models spontaneously develop geometric structures corresponding to human perceptual domains (like color or pitch) within their internal layers, suggesting t…
The paper investigates whether modestly sized open-source language models can grasp the semantics of rare Paired-Focus constructions, finding that understanding emerges later in training and correlate…
The paper proposes explicitly disentangling positional and semantic representations in Transformer encoders, demonstrating that this separation allows for a clearer understanding of how positional inf…
The paper proposes Periodic RoPE (P-RoPE) combined with a dual-layer attention mechanism to overcome the positional encoding limitations of LLMs, enabling theoretically infinite context understanding.
Shashi Kumar, Yacouba Kaloga, Petr Motlicek, Ina Kodrasi +1 more
The paper introduces Geometric Latent Reasoning (GLR), a method that models reasoning as continuous paths in the embedding space, showing that this continuous approach allows LLMs to solve problems us…
Garvin Guo, Yu Chen, Xiang Wang, Shuai Li +3 more
The paper deconstructs latent visual reasoning tokens into components and finds that the performance gains are primarily due to boundary markers and attention patterns, not the tokens' ability to enco…
Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin +4 more
The eMoT framework enhances multi-step reasoning in LLMs by treating reasoning as an evolving memory, stabilizing performance through symbolic computation and structured refinement.
DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…
The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…
The paper identifies specific attention heads in LLMs responsible for 'cultural binding'—associating cultural items with appropriate identities—and demonstrates that this capability is pre-trained and…
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment…
The paper investigates compositional abilities in LLMs and humans using the Personal Relation Task, finding that LLMs excel at the structured (Intensional) task while humans are better at the real-wor…