ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02092· 19 results

cs.CVcs.AIRecentMay 27, 2026

FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

Jorge L. Rodriguez, Victor Angulo Morales, Areej Alwahas, Mariana Elias Lara +5 more

FLORO is a multimodal geospatial foundation model that learns transferable remote sensing representations from a small, diverse corpus, achieving strong performance across various sensor types and res…

View →
cs.CVcs.AIcs.RORecentMay 28, 2026

Energy-Aware NECO for Single-Pass Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Boyuan Zhang, Huanshan Huang, Yifei Cao

The paper proposes Energy-Aware NECO, a single-pass hybrid detector that combines geometric ratio and logit-based energy scores to achieve superior pixel-wise out-of-distribution detection for semanti…

View →
cs.AIRecentJun 1, 2026

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer +2 more

The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more sem…

View →
cs.CVcs.LGRecentJun 1, 2026

Deep Learning for Remote Sensing to Improve Flood Inundation Mapping

Yogesh Bhattarai, Vijay Chaudhary, Wai Lim Kim, Sanjib Sharma

This paper introduces a novel cloud-removal framework using Denoising Diffusion Probabilistic Models and a Masked Diffusion Transformer to generate cloud-free multispectral flood imagery, significantl…

View →
cs.CVcs.AIcs.CLRecentMay 31, 2026

On the Limits of Token Reduction for Efficient Unified Vision Language Training

Siyi Chen, Weiming Zhuang, Jingtao Li, Lingjuan Lv

The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…

View →
cs.CVcs.AIRecentJun 1, 2026

Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks

Nermeen Abou Baker, David Rohrschneider, Uwe Handmann

This paper investigates the application of Parameter-Efficient Fine-Tuning (PEFT) methods, specifically adapters and LoRA, to large pretrained models for instance segmentation, demonstrating that thes…

View →
cs.CVcs.RORecentJun 3, 2026

CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

Yurim Jeon, Dongseong Seo, Seung-Woo Seo

CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.

View →
cs.LGcs.AIRecentMay 27, 2026

Locality-Aware Redundancy Pruning for LLM Depth Compression

Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more

The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…

View →
cs.CVRecentJun 1, 2026

LL-Bench: Rethinking Low-Level Vision Evaluation in the Era of Large-Scale Generative Models

Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more

The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…

View →
cs.CVcs.AIRecentMay 29, 2026

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

Erik Großkopf, Soumya Snigdha Kundu, Hendrik Möller, Nicolas Münster +8 more

The paper proposes a unified framework to systematically redefine instance matching for Panoptic Quality evaluation, moving beyond the standard One-to-One matching to accommodate complex scenarios lik…

View →
cs.CVcs.AIRecentJun 1, 2026

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

Adrián Cánovas-Rodriguez, Miguel A. González-Illán, Maria Fernanda García-Cruz, Pedro Nortes Tortosa +4 more

The paper proposes an attention-enhanced deep learning framework using EfficientNet and CBAM to achieve high accuracy (93.3%) in classifying peach leaf damage, demonstrating improved robustness under…

View →
cs.CVcs.AIRecentMay 28, 2026

xModel-KD: Cross-modal Knowledge Distillation for 3D Scene Perception using LiDAR

Thenukan Pathmanathan, Kanchan Keisham, Thangarajah Akilan

The paper proposes xModel-KD, a cross-modal knowledge distillation framework, to improve 3D point cloud segmentation by effectively transferring rich appearance cues from 2D images to sparse 3D geomet…

View →
cs.CLcs.AIRecentMay 30, 2026

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

Ravil Mussabayev, Rustam Mussabayev

The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…

View →
cs.CVcs.AIcs.CLRecentMay 28, 2026

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

Selim Kuzucu, Alessio Tonioni, Vasile Lup, Bernt Schiele +2 more

PARCEL introduces a novel visual tokenization architecture that combines spatial pooling anchors with conditioned elastic queries, efficiently reducing the computational cost of large Vision-Language…

View →
cs.CVcs.AIEmpiricalRecentJun 10, 2026

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Cheng-Yu Yang, Shao-Yuan Lo, Yu-Lun Liu

肖代替了视觉令牌的永久删除,通过可恢复的路由来改进视觉语言模型的性能

View →
cs.CVcs.AIRecentMay 28, 2026

A Novel Global Context-aware Deep Neural Network for Enhanced Brain Tumor Segmentation using Magnetic Resonance Images

Sourjya Mukherjee, Ananya Bhattacharjee, R. Murugan

The paper proposes a novel Global Context-aware Squeeze and Excite Residual UNet (GCSER-UNet) network, which significantly enhances brain tumor segmentation accuracy on benchmark MRI datasets.

View →
cs.CVcs.RORecentJun 2, 2026

Exploring Easy Boosts for Lidar Semantic Scene Completion

Tetiana Martyniuk, Jonathan Seele, Alexandre Boulch, Gilles Puy +2 more

The paper shows that simple, non-architectural enhancements, such as adding semantic pseudo-labels and visibility information, can significantly boost Lidar Semantic Scene Completion performance.

View →
cs.CVcs.AIRecentMay 31, 2026

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

Bingyu Li, Da Zhang, Tao Huo, Zhiyuan Zhao +2 more

The paper introduces Multi-temporal Referring Segmentation (MTRS), a new task requiring models to segment language-described temporal changes, and proposes MTRefSeg-R1, a specialized framework that ac…

View →
cs.AIRecentMay 27, 2026

CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models

Fengze Yang, Bo Yu, Xuewen Luo, Cathy Liu +1 more

CIVIC is a path-consistent compact visual inference framework that achieves genuine hardware efficiency in Vision-Language Models by maintaining contiguous sequence representations across all inferenc…

View →