Papers similar to 2606.12412

~ similar to 2606.12412· 18 results

cs.CVcs.AIcs.CLRecentMay 31, 2026

On the Limits of Token Reduction for Efficient Unified Vision Language Training

Siyi Chen, Weiming Zhuang, Jingtao Li, Lingjuan Lv

The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…

View →

cs.CVcs.AIRecentMay 28, 2026

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning

Geng Li, Guohao Chen, Ting Chen, Shilin Shan +5 more

OccamToken introduces a training-free, adaptive token pruning framework that replaces fixed token budgets with relative evidence testing against a register-based reference, significantly improving VLM…

View →

cs.CLcs.AIcs.LGRecentJun 4, 2026

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang +1 more

The paper proposes Cross-Layer Sparse Attention (CLSA) to significantly improve the efficiency and accuracy of long-context LLMs by jointly optimizing KV-cache sharing and the routing index across dec…

View →

cs.AIRecentMay 27, 2026

CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models

Fengze Yang, Bo Yu, Xuewen Luo, Cathy Liu +1 more

CIVIC is a path-consistent compact visual inference framework that achieves genuine hardware efficiency in Vision-Language Models by maintaining contiguous sequence representations across all inferenc…

View →

cs.LGcs.AIRecentMay 27, 2026

Locality-Aware Redundancy Pruning for LLM Depth Compression

Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more

The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…

View →

cs.CVcs.AIcs.CLRecentMay 28, 2026

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

Selim Kuzucu, Alessio Tonioni, Vasile Lup, Bernt Schiele +2 more

PARCEL introduces a novel visual tokenization architecture that combines spatial pooling anchors with conditioned elastic queries, efficiently reducing the computational cost of large Vision-Language…

View →

cs.AIRecentJun 1, 2026

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

Yuyang Li, Zihe Yan, Tobias Käfer

RASER introduces a family of cheap, router-based systems that selectively decide whether to perform expensive multi-hop retrieval, significantly reducing LLM token costs while maintaining state-of-the…

View →

cs.CVcs.CLRecentJun 1, 2026

InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

Xinxin Liu, Shiwei Gan, Xiao Liu, Yafeng Yin +2 more

InfoMerge is a novel, training-free method that significantly compresses visual tokens for Video-LLMs by estimating temporal redundancy and allocating tokens based on content richness, achieving high…

View →

cs.CVcs.CLRecentMay 31, 2026

Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning

Jixuan He, Xueting Li, Chieh Hubert Lin, Ming-Hsuan Yang

Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…

View →

cs.CVcs.AIRecentMay 30, 2026

V-LynX: Token Interface Alignment for Video+X LLMs

Jungin Park, Jiyoung Lee, Kwanghoon Sohn

V-LynX is a framework that enhances Video LLMs by integrating new modalities into their existing token interface, achieving state-of-the-art performance across diverse video understanding tasks.

View →

cs.CLRecentMay 31, 2026

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more

The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…

View →

cs.DCcs.AIcs.NIRecentMay 31, 2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU f…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

AdaCodec: A Predictive Visual Code for Video MLLMs

Haowen Hou, Zhen Huang, Zheming Liang, Qingyi Si +7 more

AdaCodec introduces a predictive visual coding scheme for video MLLMs, significantly improving efficiency and performance by transmitting only inter-frame changes and full reference frames when necess…

View →

cs.CLcs.AIRecentMay 27, 2026

PrunePath: Towards Highly Structured Sparse Language Models

Zhexuan Gu, Zixun Fu, Yancheng Yuan

PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…

View →

cs.CVcs.AIRecentMay 27, 2026

ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning

Guannan Lv, Ren Nie, Hongjian Dou, Tingting Gao

ROVER is a lightweight, learnable plugin that efficiently routes and integrates object-centric visual evidence across multiple images and objects, significantly improving performance on grounded multi…

View →

cs.CLcs.AIRecentJun 1, 2026

AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training

Liu Qing, Ou Wu, Yi Du

AlphaToken is a novel response token valuation framework that improves LLM post-training by decoupling token selection into task-specific adaptation and stability preservation, leading to better perfo…

View →

cs.CVcs.AIRecentMay 29, 2026

Beyond Classification: Dynamic Adapter Routing for Continual Multimodal Retrieval

Alicja Dobrzeniecka, Filip Szatkowski, Sebastian Cygert, Szymon Lukasik +1 more

The paper proposes Dynamic Adapter Routing (DAR), a novel method that significantly improves continual multimodal retrieval by adaptively selecting and merging specialized adapters.

View →

cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →