Papers similar to 2606.01509

~ similar to 2606.01509· 20 results

cs.CLcs.AIRecentMay 27, 2026

Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

Guanzhi Deng, Kuan Wu, Haibo Wang, Shing Yin Wong +2 more

The paper introduces RA-MoE, a novel fine-tuning framework that leverages the internal routing structure of Mixture-of-Experts (MoE) models to improve performance on multilingual downstream tasks by a…

View →

cs.CRRecentMay 6, 2026

Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs

Zekun Fei, Zihao Wang, Weijie Liu, Ruiqi He +3 more

Misrouter introduces an input-only adversarial framework to exploit the routing mechanisms of Mixture-of-Experts (MoE) LLMs, enabling unsafe behavior induction against remotely hosted, black-box servi…

View →

cs.AIRecentMay 31, 2026

DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts

Jiarui Feng, Hanqing Zeng, Karish Grover, Ruizhong Qiu +10 more

The paper proposes DAG-MoE, a novel sparse Mixture-of-Experts framework that replaces standard weighted-sum aggregation with structural aggregation to enhance model performance and enable multi-step r…

View →

cs.LGcs.AIRecentJun 1, 2026

DOT-MoE: Differentiable Optimal Transport for MoEfication

Udbhav Bamba, Arnav Chavan, Aryamaan Thakur, Steve Teig +1 more

DOT-MoE introduces a novel framework that treats the decomposition of dense layers into Mixture of Experts (MoE) as a Differentiable Optimal Transport problem, achieving superior efficiency while pres…

View →

cs.AIcs.CRRecentMay 22, 2026

Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts

Md Nurul Absar Siddiky

The paper analyzes the routing behavior of Mixtral MoE under benign and harmful prompts using activation and gradient signals, finding that safety-relevant routing is subtle, depth-dependent, and dist…

View →

cs.LGcs.AIcs.CLEmpiricalRecentJun 10, 2026

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin

This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.

View →

cs.LGcs.AIRecentMay 31, 2026

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Zhiyao Xu, Aoxue Liu, Zhanjie Ding, Dan Zhao +2 more

The paper proposes Task-Aware Coactivation Grouping (TACG) to significantly reduce communication costs in multi-task MoE inference by grouping experts based on task-specific co-activation patterns, ou…

View →

cs.AIRecentMay 28, 2026

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong +2 more

ConMoE proposes a train-free method for compressing Mixture-of-Experts (MoE) models by consolidating the large expert pool into a smaller set of reusable prototypes and deterministically remapping all…

View →

cs.LGcs.AIRecentMay 29, 2026

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

Daize Dong, Junlin Chen, Haolong Jia, Jiawei Wu +8 more

The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training a…

View →

cs.LGcs.AIcs.CLRecentMay 30, 2026

MESA: Improving MoE Safety Alignment via Decentralized Expertise

Yitong Sun, Yao Huang, Teng Li, Ranjie Duan +4 more

MESA is a targeted alignment framework that decentralizes safety responsibilities across multiple experts in Mixture-of-Experts (MoE) LLMs using Optimal Transport theory, thereby improving safety robu…

View →

cs.CLRecentMay 29, 2026

dMoE: dLLMs with Learnable Block Experts

Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma +1 more

dMoE proposes a block-level Mixture-of-Experts (MoE) framework for Diffusion Large Language Models (dLLMs) that aggregates token-level expert distributions into a unified block-level distribution, sig…

View →

cs.CRRecentApr 30, 2026

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček, Sengim Karayalçin +1 more

MASCing is a novel framework that enables flexible, non-retraining reconfiguration of Mixture-of-Experts (MoE) models for specific safety objectives by applying activation steering masks to control ex…

View →

math.DScs.AIcs.LGRecentMay 27, 2026

A Minimal Bifurcation Model of Load Imbalance in a Softmax Mixture-of-Experts Router

O. M. Kiselev

The paper develops a minimal dynamical model showing that adaptive softmax routing in Mixture-of-Experts (MoE) layers can undergo abrupt transitions to load imbalance via bifurcation mechanisms.

View →

cs.CLRecentMay 29, 2026

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An +3 more

MoG proposes a novel Mixture of Experts framework for graph-based RAG, which uses hub graphs to guide the sparse activation of domain-specific expert graphs, significantly improving retrieval accuracy…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Pruning and Distilling Mixture-of-Experts into Dense Language Models

Junhyuck Kim, Jihun Yun, Haechan Kim, Gyeongman Kim +2 more

The paper introduces a systematic framework to convert large Mixture-of-Experts (MoE) models into memory-efficient, fully dense architectures, achieving superior performance compared to traditional pr…

View →

cs.LGcs.AIcs.CLRecentMay 14, 2026

MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Weisen Jiang, Shuhao Chen, Sinno Jialin Pan

MetaMoE introduces a privacy-preserving framework that unifies independently trained, domain-specialized experts into a single Mixture-of-Experts (MoE) model using diversity-aware proxy data.

View →

cs.LGcs.AIRecentMay 28, 2026

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

Amirhossein Ghaffari, Saeid Sheikhi, Ekaterina Gilman

The paper proposes GC-MoE, a graph-conditioned Mixture of Experts framework, to improve traffic forecasting by assigning personalized, specialized forecasting experts to individual road segments.

View →

cs.LGcs.CLRecentMay 30, 2026

Confidence-Adaptive SwiGLU for Mixture-of-Experts

Shaohua Li, Xiuchao Sui, Xiaobing Sun, Yuhang Wu +3 more

The paper introduces Confidence-Adaptive SwiGLU ($κ$-SwiGLU), a novel gating mechanism for Mixture-of-Experts (MoE) models that dynamically adjusts the gate sharpness based on token-level routing conf…

View →

cs.CRcs.ARcs.CLRecentMay 24, 2026

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

Bo Lv, Zhiheng Xu, KeDong Xiu, Ruyi Ding +3 more

RouteScan introduces a non-intrusive framework that audits the safety of Mixture-of-Experts (MoE) LLMs by analyzing low-level GPU expert routing telemetry, achieving high accuracy even on unseen harmf…

View →

cs.AIcs.LGRecentMay 27, 2026

Continual Model Routing in Evolving Model Hubs

Jack Bell, Giacomo Carfì, Gerlando Gramaglia, Vincenzo Lomonaco

The paper addresses the challenge of routing across rapidly expanding model hubs by proposing CARvE, a contrastive embedding approach that significantly improves continual model selection accuracy.

View →