~ similar to 2606.00548· 18 results
FLORO is a multimodal geospatial foundation model that learns transferable remote sensing representations from a small, diverse corpus, achieving strong performance across various sensor types and res…
Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer +2 more
The paper advocates for a paradigm shift toward joint Spatial Representation Learning (SRL) that unifies raster imagery and structured vector data into a single embedding space for developing more sem…
Adrián Cánovas-Rodriguez, Miguel A. González-Illán, Maria Fernanda García-Cruz, Pedro Nortes Tortosa +4 more
The paper proposes an attention-enhanced deep learning framework using EfficientNet and CBAM to achieve high accuracy (93.3%) in classifying peach leaf damage, demonstrating improved robustness under…
Places in the Wild introduces a massive, high-resolution RAW photograph dataset of 67,574 images captured in situ across 810 locations, providing unprecedented detail for ecologically valid vision res…
The paper introduces a knowledge distillation framework to adapt a dead tree detection model trained on one geographical area (Finland) to multiple diverse forest types (Poland, Germany, Estonia), ach…
ROVER is a lightweight, learnable plugin that efficiently routes and integrates object-centric visual evidence across multiple images and objects, significantly improving performance on grounded multi…
The paper introduces SPARROW, an autonomous, open-source platform that uses solar power, edge AI, and satellite communication to enable continuous, scalable biodiversity monitoring in remote global ec…
This paper introduces a novel cloud-removal framework using Denoising Diffusion Probabilistic Models and a Masked Diffusion Transformer to generate cloud-free multispectral flood imagery, significantl…
LALE introduces a novel lightweight architecture that efficiently combines local convolutional features and global transformer context for land-cover segmentation, achieving superior efficiency and pe…
Yu Xue, Haoxuan Qu, Zhuoling Li, Yihang Lou +3 more
The paper introduces ToolFG, a novel tool-integrated MLLM framework that enhances fine-grained image classification by enabling models to autonomously use external tools to gather verifiable visual cu…
The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…
CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.
This paper provides the first comprehensive threat model for IoT-enabled Controlled Environment Agriculture (CEA) systems, identifying 123 unique threats and proposing a defense-in-depth framework to…
The paper introduces a robust, four-mechanism LLM pipeline that generates auditable, evidence-grounded structured trait records for hundreds of thousands of diverse species across multiple taxa.
Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao +2 more
GiPL proposes a novel two-branch framework combining iterative pseudo-label self-training and generative data augmentation to significantly improve Cross-Domain Few-Shot Object Detection by better uti…
This paper proposes a novel volumetric change detection sub-task for monitoring slope instabilities using time-lapse cameras, demonstrating that dense and semi-dense feature matching techniques are ro…
The paper introduces CAIAMAR, a multi-agent reasoning framework that achieves context-aware and high-fidelity anonymization of personally identifiable information (PII) in street imagery, significantl…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…