ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.28567· 18 results

q-bio.NCcs.LGRecentJun 1, 2026

How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations

William Dorrell

The paper theoretically analyzes the properties that optimal sparse autoencoder (SAE) dictionaries must satisfy, deriving constraints that explain observed SAE behaviors like hierarchical splitting an…

View →
cs.AIRecentMay 28, 2026

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey +22 more

The paper demonstrates that sparse autoencoders can successfully extract a large set of interpretable, causally influential features from the production-scale Claude 3 Sonnet language model.

View →
cs.LGcs.AIRecentMay 27, 2026

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions

Prathyush Poduval, Calvin Yeung, Neel Desai, Mohsen Imani

The paper introduces Residualized Sparse Autoencoders (ReSAEs) to improve multi-layer interventions in transformers by training each layer on the residual activation, which better preserves cross-laye…

View →
cs.LGcs.AIcs.CLRecentApr 20, 2026

Towards Understanding the Robustness of Sparse Autoencoders

Ahson Saiyed, Sabrina Sadiekh, Chirag Agarwal

The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by…

View →
cs.CLRecentMay 29, 2026

How Far Do Auto-Interpretation Labels Generalize: A Controlled Study Across Languages, Scripts, and Rewordings

Sripad Karne

The study investigates the generalization of auto-generated natural-language labels for language model features, finding that while the underlying features show cross-lingual semantic consistency, the…

View →
cs.CLcs.AIcs.LGRecentMay 29, 2026

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

Mikkel Godsk Jørgensen, Lars Kai Hansen

This paper demonstrates that Sparse Autoencoders (SAEs) can effectively steer Large Language Models (LLMs) on the AxBench benchmark, achieving performance comparable to LoRA baselines when combined wi…

View →
cs.LGcs.AIRecentMay 27, 2026

Locality-Aware Redundancy Pruning for LLM Depth Compression

Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more

The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…

View →
cs.CLcs.AIRecentJun 1, 2026

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…

View →
cs.CLcs.AIcs.LGRecentJun 4, 2026

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang +1 more

The paper proposes Cross-Layer Sparse Attention (CLSA) to significantly improve the efficiency and accuracy of long-context LLMs by jointly optimizing KV-cache sharing and the routing index across dec…

View →
cs.AIRecentMay 29, 2026

Geodesic Flow Matching for Denoising High-Dimensional Structured Representations

Karim Habashy, Chris Eliasmith

The paper introduces Geodesic Flow Matching, a manifold-aware denoising technique that adapts Riemannian transport dynamics to accurately clean high-dimensional structured representations like Spatial…

View →
cs.CVcs.AIcs.LGRecentMay 27, 2026

Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models

Calvin Yeung, Prathyush Poduval, Ali Zakeri, Zhuowen Zou +1 more

The paper introduces residualized temporal Sparse Autoencoders (SAEs) to analyze the full spatiotemporal structure of activations generated during the iterative denoising process of diffusion models,…

View →
cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →
cs.CVcs.AIRecentMay 29, 2026

Variational Adapter for Cross-modal Similarity Representation

WenZhang Wei, Zhipeng Gui, Dehua Peng, Tiandi Ye +1 more

The paper proposes a Variational Adapter (VACSR) to improve cross-modal similarity representation by treating fine-grained image-text matching as a variational inference problem, thereby mitigating th…

View →
cs.CLcs.AIRecentMay 27, 2026

Semantic Flow Regularization: Teaching LLMs to Generate Diverse Yet Coherent Responses

Kerui Peng, Feifei Li, Xingyu Fan, Wenhui Que

The paper introduces Semantic Flow Regularization (SFR), an auxiliary objective that significantly improves the diversity and quality of LLM responses when fine-tuned for specific styles or personas,…

View →
cs.CLcs.AIRecentMay 28, 2026

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Pierre-Antoine Lequeu, Camille Barboule, Benjamin Piwowarski

The paper proposes explicitly disentangling positional and semantic representations in Transformer encoders, demonstrating that this separation allows for a clearer understanding of how positional inf…

View →
cs.IRcs.AIcs.CLRecentMay 28, 2026

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

Benjamin Clavié, Sean Lee, Aamir Shakir, Makoto P. Kato

The paper introduces Latent Terms, a method that shows dense retrieval models implicitly learn sparse, Zipfian vocabularies that can be used for classical BM25-style sparse scoring without requiring s…

View →
cs.CLcs.AIRecentMay 27, 2026

PrunePath: Towards Highly Structured Sparse Language Models

Zhexuan Gu, Zixun Fu, Yancheng Yuan

PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…

View →
cs.IRcs.AIcs.LGRecentMay 28, 2026

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng +2 more

The paper introduces Single-stage Sparse Retrieval (SSR), a method that replaces computationally expensive vector clustering with sparse autoencoding to achieve highly efficient multi-vector retrieval…

View →