~ similar to 2605.28567· 18 results
The paper theoretically analyzes the properties that optimal sparse autoencoder (SAE) dictionaries must satisfy, deriving constraints that explain observed SAE behaviors like hierarchical splitting an…
Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey +22 more
The paper demonstrates that sparse autoencoders can successfully extract a large set of interpretable, causally influential features from the production-scale Claude 3 Sonnet language model.
The paper introduces Residualized Sparse Autoencoders (ReSAEs) to improve multi-layer interventions in transformers by training each layer on the residual activation, which better preserves cross-laye…
The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by…
The study investigates the generalization of auto-generated natural-language labels for language model features, finding that while the underlying features show cross-lingual semantic consistency, the…
This paper demonstrates that Sparse Autoencoders (SAEs) can effectively steer Large Language Models (LLMs) on the AxBench benchmark, achieving performance comparable to LoRA baselines when combined wi…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang +1 more
The paper proposes Cross-Layer Sparse Attention (CLSA) to significantly improve the efficiency and accuracy of long-context LLMs by jointly optimizing KV-cache sharing and the routing index across dec…
The paper introduces Geodesic Flow Matching, a manifold-aware denoising technique that adapts Riemannian transport dynamics to accurately clean high-dimensional structured representations like Spatial…
Calvin Yeung, Prathyush Poduval, Ali Zakeri, Zhuowen Zou +1 more
The paper introduces residualized temporal Sparse Autoencoders (SAEs) to analyze the full spatiotemporal structure of activations generated during the iterative denoising process of diffusion models,…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
WenZhang Wei, Zhipeng Gui, Dehua Peng, Tiandi Ye +1 more
The paper proposes a Variational Adapter (VACSR) to improve cross-modal similarity representation by treating fine-grained image-text matching as a variational inference problem, thereby mitigating th…
The paper introduces Semantic Flow Regularization (SFR), an auxiliary objective that significantly improves the diversity and quality of LLM responses when fine-tuned for specific styles or personas,…
The paper proposes explicitly disentangling positional and semantic representations in Transformer encoders, demonstrating that this separation allows for a clearer understanding of how positional inf…
The paper introduces Latent Terms, a method that shows dense retrieval models implicitly learn sparse, Zipfian vocabularies that can be used for classical BM25-style sparse scoring without requiring s…
PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…
Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng +2 more
The paper introduces Single-stage Sparse Retrieval (SSR), a method that replaces computationally expensive vector clustering with sparse autoencoding to achieve highly efficient multi-vector retrieval…