ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.31196· 20 results

cs.CVcs.CLcs.RORecentJun 1, 2026

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

Huiqiong Li, Jiayu Wang, Zhiting Mei, Anirudha Majumdar +2 more

The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current mode…

View →
cs.CRcs.AIcs.CVRecentMar 28, 2026

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

Xiao Li, Xiang Zheng, Yifeng Gao, Xinyu Xia +34 more

This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust,…

View →
cs.CLcs.AIcs.CVRecentJun 1, 2026

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…

View →
cs.ROcs.AIcs.LGRecentJun 1, 2026

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

Haimin Hu

The paper proposes an algorithmic method using conformal prediction to formally certify high-probability safety for Belief-Space Neural Safety Filters (BeliefSF), significantly improving safety guaran…

View →
cs.CVcs.LGcs.RORecentJun 2, 2026

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

Hanjiang Hu, Yiyuan Pan, Jiaxing Li, Xusheng Luo +4 more

VLESA is a novel framework that monitors human activities from egocentric video to predict and intervene in dangerous actions by incorporating goal-conditioned safety checks based on inferred intent.

View →
cs.ROcs.AIRecentMay 29, 2026

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…

View →
cs.CLRecentMay 29, 2026

EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents

Dongwook Choi, Taeyoon Kwon, Bogyung Jeong, Minju Kim +5 more

EMBGuard introduces a novel, MLLM-based safety guardrail that explicitly identifies and explains physical hazards from (visual observation, action) pairs, enabling safer planning for embodied agents.

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.CVcs.AIRecentMay 30, 2026

Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion

Shivam Singh, Saptarshi Majumdar, Pratik Prabhanjan, Zicheng Liu +1 more

The paper introduces pause-and-think-T, a reasoning-centric dataset and benchmark that enables compact Vision-Language Models to perform visually grounded, context-aware action suggestion, matching la…

View →
cs.ROcs.AIcs.CVRecentMay 31, 2026

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

Oskar Natan, Andi Dharmawan, Aufaclav Zatu Kusuma Frisky, Jazi Eko Istiyanto +1 more

DeepIPCv3 is a novel multi-modal framework that fuses LiDAR and DVS event streams using cross-modal attention to achieve state-of-the-art, highly reactive avoidance maneuvers for sudden pedestrian cro…

View →
cs.CRcs.AIRecentMay 6, 2026

How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study

Junran Wang, Xinjie Shen, Zehao Jin, Pan Li

The paper introduces ImmersedPrivacy, an interactive audio-visual framework, and finds that current Vision-Language Models (VLMs) deployed in physical environments suffer from significant deficits in…

View →
cs.ROcs.CRRecentMay 15, 2026

Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise

Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more

The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…

View →
cs.SEcs.CRRecentMay 31, 2026

SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces

Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more

The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…

View →
cs.CRcs.RORecentMay 19, 2026

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

Doguhuan Yeke, Yanming Zhou, Leo Y. Lin, Hongyu Cai +2 more

The paper introduces RoboJailBench, the first standardized evaluation framework for assessing adversarial jailbreak attacks and defenses in embodied AI systems like robots.

View →
cs.ROcs.AIRecentMay 29, 2026

TARIC: Memory-Augmented Traversability-Aware Outdoor VLN under Interrupted Semantic Cues

Tianle Zeng, Hanjing Ye, Jianwei Peng, Jingwen Yu +2 more

The paper proposes a memory-augmented, traversability-aware framework for outdoor VLN that maintains stable, goal-consistent guidance even when semantic cues are interrupted or unavailable.

View →
cs.CVcs.AIRecentMay 29, 2026

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang +1 more

The paper introduces a structured multi-level visual perturbation framework to systematically analyze how dependent VLA-based driving behavior is on visual information, revealing uneven visual groundi…

View →
cs.ROcs.AIeess.SYRecentMay 30, 2026

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Haofan Cao, Zhaoyang Li, Zhichao You, Liang Guo +1 more

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

View →
cs.CRcs.AIcs.LGRecentApr 1, 2026

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar

This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…

View →
cs.CVRecentJun 1, 2026

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos

Jinpeng Liu, Yukang Xu, Yutong Li, Xingyu Liu

TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…

View →
cs.CVcs.AIRecentMay 29, 2026

ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

Kaiwen Xue, Tao Wei, Guoxin Zhang, Zhonghong Ou +4 more

The paper introduces ERGeoBench, a comprehensive diagnostic benchmark designed to evaluate the fine-grained capabilities of multimodal large language models (MLLMs) for embodied geo-localization acros…

View →