~ similar to 2606.01899· 20 results
CIPER proposes a unified transformer framework to simultaneously perform cross-view image retrieval and precise 3-DoF pose estimation, overcoming the limitations of cascaded, separate methods.
Jiaxi Liu, Hangyu Li, Yang Cheng, Rui Gana +6 more
The paper proposes a pose-conditioned, permutation-equivariant denoiser to accurately reconstruct work zone geometry using noisy Ultra-Wideband (UWB) range data from connected and autonomous vehicles…
Mingxi Zhang, Renjie Xie, Jincheng Wang, Guyue Li +1 more
The paper proposes a lightweight, self-adaptive framework using LoRA to efficiently extract and aggregate radio frequency fingerprints for robust open-set authentication in dynamic wireless environmen…
The paper proposes DRIFT, a lightweight joint channel estimation and prediction framework, to significantly reduce pilot overhead and boost spectral efficiency in power-constrained LEO Non-Terrestrial…
The paper proposes an uncertainty-aware, decentralized fusion layer for multi-UAV systems that significantly improves 3D localization robustness by incorporating neighbor constraints and handling faul…
The paper introduces a novel two-stage framework to achieve robust, category-agnostic object localization in-context (ICL) by optimizing attention and minimizing localization error using reinforcement…
Jiawei Li, Ziyi Liu, Weijie Shi, Long Chen +2 more
SSR3D-LLM introduces a structured spatial reasoning interface for unified 3D-LLMs, allowing fine-grained object grounding by generating and processing sequential latent spatial steps.
The paper proposes MoEIoU, a novel mixture-of-experts based regression loss that adaptively models bounding-box localization errors, achieving superior convergence and accuracy in object detection.
FLORO is a multimodal geospatial foundation model that learns transferable remote sensing representations from a small, diverse corpus, achieving strong performance across various sensor types and res…
The paper proposes a communication-centric 6G-LLM architecture for tactical autonomous defense vehicles, demonstrating significant improvements in coordination and communication efficiency over conven…
Yuhua Xu, Mingtao Jiang, Chenfei Hu, Yinglong Wang +4 more
The paper proposes VerFU, a client-verifiable federated unlearning framework for low-altitude wireless networks that allows devices to ensure the server accurately removes their historical data contri…
Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more
The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…
Pengyu Chen, Weiyang Li, Jin Xu, Jiacheng Wang +3 more
This paper surveys model forensics in AI-native wireless networks, detailing key security problems and demonstrating practical workflows for verifying model authenticity and detecting malicious functi…
ROVER is a lightweight, learnable plugin that efficiently routes and integrates object-centric visual evidence across multiple images and objects, significantly improving performance on grounded multi…
PropLLM introduces a novel propagation-aware framework that uses LLMs and hop-by-hop scene reconstruction to accurately localize root causes and determine fault types in complex network fault diagnosi…
Ziyu Song, Jiaming Fang, Kuangyu Li, Tuo Xia +1 more
This paper proposes Tail-Aware Adaptive-k (TAA-k), a training-free framework for adaptive context selection in retrieval-augmented generation systems using Extreme Value Theory.
Ultra Diffusion Poser is a novel diffusion model that improves human motion tracking from sparse IMUs and UWB ranging by explicitly modeling the geometric constraints imposed by inter-sensor distances…
Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai +1 more
The paper proposes a training-free framework, Visual Representation-Guided Video-LLM Reasoning, to perform composed video retrieval by using visual examples and text instructions, achieving strong per…
MASER is a lightweight framework that dynamically routes a shared Vision-Language Model (VLM) to the most appropriate modality-specific adapter (e.g., point cloud, RGB) based on the input question, si…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…