~ similar to 2605.30022· 16 results
The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…
The paper introduces Morlet Positional Encoding (MoPE), a novel wavelet-based positional encoding that models position and locality simultaneously, outperforming standard sinusoidal and RoPE methods.
The paper analyzes the distinct computational roles of positional versus symbolic attention heads in Transformers, demonstrating that symbolic mechanisms generalize more reliably to longer sequences t…
The paper proposes Periodic RoPE (P-RoPE) combined with a dual-layer attention mechanism to overcome the positional encoding limitations of LLMs, enabling theoretically infinite context understanding.
The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
This paper benchmarks five positional encoding strategies for transformer-based EEG foundation models, concluding that the optimal encoding is task-dependent and no single strategy is universally supe…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
The paper introduces a distributional framework using Wasserstein distance to unify the semantic comparison of sparse autoencoder features across different layers and to automatically compress large f…
The paper systematically compares multiple content representations for RAG pipelines and finds that answer retention—the ability of the representation to preserve the original answer-bearing content—i…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos +1 more
The paper proposes an inference-time activation steering framework, utilizing orthogonalization, to achieve fine-grained, deterministic control over discrete musical attributes like Pitch and Duration…
The paper demonstrates that Transformers trained on local comparisons implicitly learn a global, one-dimensional ordinal structure, mirroring the human ability to perform transitive inference.
The paper argues that large activation spikes in LLMs are structural vector biases, and proposes a novel quantization framework (INSERTQUANT) to eliminate these spikes, enabling robust low-bit quantiz…
Mingkuan Zhao, Yide Gao, Wentao Hu, Suquan Chen +5 more
The paper proposes Resonant Context Anchoring (RCA), a lightweight, training-free method that enhances factual faithfulness in LLMs by dynamically amplifying the signal of external context evidence du…
This paper analyzes the decoding process of masked diffusion models for graph-to-text generation, finding that structural fine-tuning disrupts natural entity-first generation and proposing a structura…