ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02374· 19 results

cs.CVcs.AIRecentMay 27, 2026

FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

Jorge L. Rodriguez, Victor Angulo Morales, Areej Alwahas, Mariana Elias Lara +5 more

FLORO is a multimodal geospatial foundation model that learns transferable remote sensing representations from a small, diverse corpus, achieving strong performance across various sensor types and res…

View →
cs.CVRecentJun 1, 2026

From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data

Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more

The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…

View →
cs.CVcs.AIcs.LGRecentMay 30, 2026

CAFOSat: A Strongly Annotated Dataset for Infrastructure-Aware CAFO Mapping Using High-Resolution Imagery

Oishee Bintey Hoque, Nibir Chandra Mandal, Mandy L Wilson, Samarth Swarup +2 more

The paper introduces CAFOSat, a large-scale, strongly annotated, and infrastructure-aware dataset designed to improve the accuracy of mapping Concentrated Animal Feeding Operations (CAFOs) from high-r…

View →
cs.CVcs.CLRecentMay 31, 2026

Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning

Jixuan He, Xueting Li, Chieh Hubert Lin, Ming-Hsuan Yang

Reasmory introduces a structured programming framework that uses explicit 3D memory and a Domain-Specific Language (DSL) to reliably enhance Vision-Language Models' spatial reasoning capabilities, ach…

View →
cs.CLcs.AIRecentMay 29, 2026

The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning

Xudong Zhang, Jian Yang, Shengkai Wang, Jiangpeng Tian +4 more

The paper proposes a dual-interventional framework to characterize how linguistic structures and contextual cues influence LLMs' spatial reasoning for navigation, finding that topological information…

View →
cs.CVRecentJun 1, 2026

Places in the Wild: A Large, High-Resolution RAW Photograph Dataset for Ecologically Valid Vision Research

Michelle R. Greene

Places in the Wild introduces a massive, high-resolution RAW photograph dataset of 67,574 images captured in situ across 810 locations, providing unprecedented detail for ecologically valid vision res…

View →
cs.CVcs.AIRecentMay 29, 2026

ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

Kaiwen Xue, Tao Wei, Guoxin Zhang, Zhonghong Ou +4 more

The paper introduces ERGeoBench, a comprehensive diagnostic benchmark designed to evaluate the fine-grained capabilities of multimodal large language models (MLLMs) for embodied geo-localization acros…

View →
cs.CVcs.AIEmpiricalRecentJun 11, 2026

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su +7 more

This paper proposes SpatialClaw, a training-free framework for spatial reasoning that enables open-ended, complex 3D/4D spatial reasoning.

View →
cs.CVRecentJun 1, 2026

Cross-Domain Dead Tree Detection via Knowledge Distillation in Aerial Imagery

Anis Ur Rahman, Mete Ahishali, Einari Heinaro, Samuli Junttila

The paper introduces a knowledge distillation framework to adapt a dead tree detection model trained on one geographical area (Finland) to multiple diverse forest types (Poland, Germany, Estonia), ach…

View →
cs.AIcs.DBcs.IRRecentMay 29, 2026

Vector Linking via Cross-Model Local Isometric Consistency

Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more

The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…

View →
cs.CVcs.AIRecentMay 27, 2026

ROVER: Routing Object-Centric Visual Evidence for Grounded Multi-Image Reasoning

Guannan Lv, Ren Nie, Hongjian Dou, Tingting Gao

ROVER is a lightweight, learnable plugin that efficiently routes and integrates object-centric visual evidence across multiple images and objects, significantly improving performance on grounded multi…

View →
cs.CVcs.RORecentJun 3, 2026

CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

Yurim Jeon, Dongseong Seo, Seung-Woo Seo

CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.

View →
cs.CVcs.LGRecentJun 1, 2026

Deep Learning for Remote Sensing to Improve Flood Inundation Mapping

Yogesh Bhattarai, Vijay Chaudhary, Wai Lim Kim, Sanjib Sharma

This paper introduces a novel cloud-removal framework using Denoising Diffusion Probabilistic Models and a Masked Diffusion Transformer to generate cloud-free multispectral flood imagery, significantl…

View →
cs.CVRecentJun 4, 2026

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao +3 more

The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence fro…

View →
cs.CVcs.AIRecentMay 28, 2026

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2 more

The paper proposes GASP, a framework that injects fundamental geometric priors directly into Vision-Language Models (VLMs) using ground-truth video geometry, significantly enhancing 3D spatial reasoni…

View →
cs.CVcs.AIRecentMay 28, 2026

VLM3: Vision Language Models Are Native 3D Learners

Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more

The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…

View →
cs.CVcs.AIcs.CLRecentMay 29, 2026

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

Tianhui Liu, Jie Feng, Zhiheng Zheng, Shengyuan Wang +5 more

The paper introduces SpatialAct, a challenging benchmark that reveals a significant 'reasoning-to-action gap,' showing that current VLMs struggle to maintain coherent spatial understanding and perform…

View →
cs.CVcs.CGRecentMay 28, 2026

S2MDF: A Plug-And-Play Layer for Intersection-Free Multi-Object Signed Distance Fields

Deniz Sayin Mercadier, Federico Stella, Aurel Bizeau, Nicolas Talabot +1 more

The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…

View →
cs.AIRecentMay 28, 2026

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

Yiming Liu, Bin Lu, Meng Jin, Ziyuan Sang +5 more

The paper introduces Compass, an expert-guided LLM agent framework that successfully extracts and integrates thousands of previously inaccessible marine lead records from vast corpora of scientific pa…

View →