~ similar to 2605.29935· 20 results
Shibo Zhu, Xiaodan Shi, Dayin Chen, Yuntian Chen +3 more
The paper introduces CityTrajBench, a unified benchmark framework that standardizes the evaluation of city-scale vehicle trajectory generation, demonstrating that no single generation model dominates…
This paper systematically analyzes the high cross-architecture transferability of physical adversarial attacks on Vision-Language Models (VLMs) used in autonomous driving, demonstrating that attacks e…
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
The paper introduces MIRAGE, a framework that systematically discovers semantic attacks on online HD map construction by finding plausible environmental variations that bypass standard adversarial def…
This paper demonstrates that reasoning-enabled Vision-Language-Action (VLA) models for autonomous driving are highly vulnerable to realistic input perturbations, significantly compromising both reason…
Shuo Ju, Qingzhao Zhang, Huashan Chen, Xuheng Wang +5 more
The paper introduces a novel adversarial attack that uses static, view-dependent camouflage on a vehicle to induce consistent feature drift, causing autonomous systems to predict false, yet plausible,…
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…
The paper proposes a multi-resolution end-to-end deep neural network for autonomous driving that dynamically adjusts input resolution to optimize the critical tradeoff between prediction accuracy and…
The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
Zhihong Liu, Siqi Kou, Zheng Li, Ye Ma +4 more
The paper introduces ProductWebGen, a benchmark for evaluating multimodal models' ability to generate consistent, high-fidelity product webpages from images and instructions, finding that separate edi…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
Rudolf Krecht, Tamas Budai, Erno Horvath, Akos Kovacs +2 more
This paper provides a comprehensive review of network optimization aspects for Connected and Autonomous Vehicles (CAVs), aiming to clarify misconceptions and outline future research directions.
Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao +9 more
BenchEvolver introduces a solution-centric evolutionary framework to automatically transform saturated coding benchmarks into significantly harder, high-quality, and diverse evaluation suites.
This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.
The paper introduces SchGen, the first large language model capable of generating editable PCB schematics from natural language by using a novel semantically grounded code representation.
AlphaTransit introduces a novel search-based planning framework that combines Monte Carlo Tree Search (MCTS) with a neural policy-value network to efficiently design high-quality, city-scale bus trans…