15 results for “Transformer-based encoder”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
The paper introduces Morlet Positional Encoding (MoPE), a novel wavelet-based positional encoding that models position and locality simultaneously, outperforming standard sinusoidal and RoPE methods.
The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…
The paper proposes explicitly disentangling positional and semantic representations in Transformer encoders, demonstrating that this separation allows for a clearer understanding of how positional inf…
The paper introduces Residualized Sparse Autoencoders (ReSAEs) to improve multi-layer interventions in transformers by training each layer on the residual activation, which better preserves cross-laye…
The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.
CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…
Zhisheng Zhang, Xiang Li, Yixuan Zhou, Jing Peng +2 more
LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and effi…
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
The paper introduces SB-ECC, a novel score-based decoder that models error correction as continuous-time denoising, achieving state-of-the-art performance across various code families and noise levels…
EncFormer is a novel two-party framework that significantly improves the efficiency and scalability of private Transformer inference by optimizing the combination of Fully Homomorphic Encryption (FHE)…
Echo is a joint-embedding predictive architecture that uses a single, pretrained ViT encoder to simultaneously perform speaker diarization, speech recognition, and dynamic source separation in a share…
The paper proposes VRPO, a reinforcement learning-based optimization strategy that replaces static alignment losses in diffusion models, significantly improving both convergence and image fidelity.
LayerRoute introduces a lightweight, input-conditioned adapter that selectively skips transformer blocks in agentic language models, achieving significant FLOPs reduction while improving performance.
LALE introduces a novel lightweight architecture that efficiently combines local convolutional features and global transformer context for land-cover segmentation, achieving superior efficiency and pe…
This paper benchmarks five positional encoding strategies for transformer-based EEG foundation models, concluding that the optimal encoding is task-dependent and no single strategy is universally supe…