Papers similar to 2605.30818

~ similar to 2605.30818· 16 results

cs.AIRecentMay 28, 2026

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang +9 more

The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) h…

View →

cond-mat.mtrl-scics.ETcs.LGRecentJun 1, 2026

Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design

Anand Babu, Rogério Almeida Gouvêa, Gian-Marco Rignanese

This review surveys advanced techniques—including generative models, multimodal learning, and closed-loop workflows—for automated inverse materials design, enabling the targeted discovery of novel cry…

View →

cs.CRRecentMay 13, 2026

Phantom Force: Injecting Adversarial Tactile Perceptions into Embodied Intelligence via EMI

Zirui Kong, Youqian Zhang, Sze Yiu Chau

This paper investigates a novel vulnerability in tactile sensing by demonstrating that targeted Electromagnetic Interference (EMI) can induce strong, misleading 'phantom forces' in Hall-effect fingert…

View →

cs.SDcs.AIcs.MMRecentMay 27, 2026

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

Chong Jing, Zitong Lan, Junan Zhang, Zhizheng Wu

EigeNet introduces a geometry-informed multi-modal Transformer framework to achieve state-of-the-art few-shot novel view Room Impulse Response (RIR) prediction by effectively integrating spatial geome…

View →

cs.CRRecentApr 4, 2026

Perceptual Gaps: ASCII Art and Overlapping Audio as CAPTCHA

Choon-Hou Rafael Chong

The paper proposes two novel CAPTCHA types—ASCII art and overlapping audio—and demonstrates that current frontier LLMs struggle significantly to solve them, suggesting they are highly effective anti-b…

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment

Shih-Yu Lai, Hirozumi Yamaguchi, Shang-Tse Chen, Yu-Lun Liu +1 more

UMEDA introduces a novel graph federated learning framework that uses spectral signal processing and diffusion models to enable privacy-preserving, robust localization across clients with highly heter…

View →

cs.AIcond-mat.mtrl-sciRecentMay 31, 2026

Property Prediction of Stacked Bilayer Materials: A Multimodal Learning Approach

An Vuong, Minh-Hao Van, Chen Zhao, Xintao Wu

The paper proposes a novel multimodal learning approach to predict the properties of new bilayer 2D materials formed by stacking dissimilar functional layers.

View →

cs.AIRecentMay 31, 2026

Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition

Wanlong Fang, Tianle Zhang, Wen Tao, Alvin Chan

The paper introduces Partial Information Decomposition (PID) to quantitatively separate unique, redundant, and synergistic contributions of different modalities (e.g., vision, language) in multimodal…

View →

cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →

cs.DCcs.AIRecentJun 1, 2026

Boosting Multimodal Federated Learning via Chained Modality Optimization

Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang +1 more

The paper proposes FedMChain, a novel federated learning framework that structures multimodal training into sequential phases to mitigate modality competition and improve model performance while reduc…

View →

cond-mat.mtrl-scics.CEcs.CLRecentMay 29, 2026

A Padding Method for Enhanced Encoding of Inorganic Structures with Varying Chemical Compositions

Thang Dang, Haderbache Amir, Tzanakakis Alexandros, Yoshimoto Yuta

The paper introduces a novel padding method that leverages crystal symmetry to enhance the encoding of complex inorganic structures, significantly improving the generation of stable, novel materials.

View →

cs.CRRecentApr 23, 2026

Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

Shahriar Rahman Khan, Raiful Hasan

The paper demonstrates a coordinated, cross-modal spoofing attack that successfully deceives state-of-the-art multi-sensor fusion systems in autonomous vehicles by making multiple sensors agree on a f…

View →

cs.CVcs.AIRecentJun 1, 2026

MASER: Modality-Adaptive Specialist Routing for Embodied 3D Spatial Intelligence

Hilton Raj, Vishnuram AV

MASER is a lightweight framework that dynamically routes a shared Vision-Language Model (VLM) to the most appropriate modality-specific adapter (e.g., point cloud, RGB) based on the input question, si…

View →

cs.CVRecentJun 1, 2026

From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data

Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more

The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…

View →

cs.CRcs.CVRecentMay 10, 2026

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more

The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…

View →

cs.SDcs.AIcs.IRRecentMay 29, 2026

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos +1 more

The paper proposes an inference-time activation steering framework, utilizing orthogonalization, to achieve fine-grained, deterministic control over discrete musical attributes like Pitch and Duration…

View →