ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.29350· 20 results

cs.CLcs.AIcs.LGRecentMay 27, 2026

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Junhyuck Kim, Jihun Yun, Haechan Kim, Gyeongman Kim +2 more

The paper introduces a systematic framework to convert large Mixture-of-Experts (MoE) models into memory-efficient, fully dense architectures, achieving superior performance compared to traditional pr…

View →
cs.AIRecentMay 31, 2026

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more

The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…

View →
cs.CLRecentMay 29, 2026

dMoE: dLLMs with Learnable Block Experts

Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma +1 more

dMoE proposes a block-level Mixture-of-Experts (MoE) framework for Diffusion Large Language Models (dLLMs) that aggregates token-level expert distributions into a unified block-level distribution, sig…

View →
cs.CLcs.AIRecentMay 27, 2026

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

Guanzhi Deng, Kuan Wu, Haibo Wang, Shing Yin Wong +2 more

The paper introduces RA-MoE, a novel fine-tuning framework that leverages the internal routing structure of Mixture-of-Experts (MoE) models to improve performance on multilingual downstream tasks by a…

View →
cs.CLcs.AIcs.LGRecentMay 27, 2026

Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts

Liu O. Martin, Lucas Bandarkar, Nanyun Peng

The paper proposes an aggressive, parameter-efficient method to prune non-essential experts from Mixture-of-Experts (MoE) LLMs, significantly compressing the model while maintaining high machine trans…

View →
cs.CLcs.AIRecentJun 1, 2026

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…

View →
cs.LGcs.AIRecentJun 1, 2026

DOT-MoE: Differentiable Optimal Transport for MoEfication

Udbhav Bamba, Arnav Chavan, Aryamaan Thakur, Steve Teig +1 more

DOT-MoE introduces a novel framework that treats the decomposition of dense layers into Mixture of Experts (MoE) as a Differentiable Optimal Transport problem, achieving superior efficiency while pres…

View →
cs.CLRecentMay 29, 2026

Mellum2 Technical Report

Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more

Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…

View →
cs.CRRecentApr 30, 2026

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin +1 more

MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control ex…

View →
cs.LGcs.AIRecentJun 1, 2026

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

Heng Zhao, Zilei Shao, Guy Van den Broeck, Zhe Zeng

The paper introduces ProbMoE, a probabilistic routing framework that tackles the non-differentiability of top-$k$ routing in Mixture-of-Experts (MoE) models, achieving strong performance with improved…

View →
cs.LGcs.AIcs.CLRecentMay 29, 2026

PithTrain: A Compact and Agent-Native MoE Training System

Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy +5 more

The paper introduces PithTrain, a compact, agent-native Mixture-of-Experts (MoE) training framework that significantly improves agent-task efficiency compared to existing production stacks.

View →
cs.LGcs.AIcs.CLRecentMay 14, 2026

MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Weisen Jiang, Shuhao Chen, Sinno Jialin Pan

MetaMoE introduces a privacy-preserving framework that unifies independently trained, domain-specialized experts into a single Mixture-of-Experts (MoE) model using diversity-aware proxy data.

View →
cs.LGcs.AIcs.CLRecentMay 30, 2026

MESA: Improving MoE Safety Alignment via Decentralized Expertise

Yitong Sun, Yao Huang, Teng Li, Ranjie Duan +4 more

MESA is a targeted alignment framework that decentralizes safety responsibilities across multiple experts in Mixture-of-Experts (MoE) LLMs using Optimal Transport theory, thereby improving safety robu…

View →
cs.PLcs.AIcs.CLRecentMay 27, 2026

FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

Loc Pham, Lang Hong Nguyet Anh, Thanh Le-Cong

FPMoE introduces a sparse Mixture-of-Experts (MoE) architecture to improve functional code generation across multiple functional programming languages, achieving state-of-the-art performance with fewe…

View →
cs.AIRecentJun 1, 2026

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more

This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…

View →
cs.LGcs.AIRecentMay 31, 2026

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Zhiyao Xu, Aoxue Liu, Zhanjie Ding, Dan Zhao +2 more

The paper proposes Task-Aware Coactivation Grouping (TACG) to significantly reduce communication costs in multi-task MoE inference by grouping experts based on task-specific co-activation patterns, ou…

View →
cs.CLRecentMay 29, 2026

GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs

Junjie Peng, You Wu, Haoyi Wu, Jialong Han +3 more

GRKV introduces a training-free KV-cache merging method that uses global regression to distribute information from evicted tokens, solving the over-merging problem inherent in span-based retention.

View →
cs.CLRecentMay 29, 2026

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An +3 more

MoG proposes a novel Mixture of Experts framework for graph-based RAG, which uses hub graphs to guide the sparse activation of domain-specific expert graphs, significantly improving retrieval accuracy…

View →
cs.CVcs.AIRecentMay 29, 2026

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models

Zijie Zhou, Dandan Zhu, Hangxiangpan Wang, Heng Zhang +2 more

The paper proposes AsyMoE, a novel Mixture of Experts architecture for Large Vision-Language Models that explicitly models the inherent asymmetry between visual and linguistic modalities, achieving si…

View →
cs.CLRecentJun 1, 2026

ResMerge: Residual-based Spectral Merging of Large Language Models

Yandu Sun, Zhiyan Hou, Haokai Ma, Yuheng Jia +5 more

ResMerge proposes a residual-based spectral merging framework that improves the combination of multiple reinforcement learning (RL) expert models by stabilizing the aggregation process using a residua…

View →