ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.29283· 17 results

physics.flu-dyncs.AIcs.LGRecentMay 31, 2026

Emergent Transfer of a Physics Foundation Model from Simulation to Laboratory Turbulence

Payel Mukhopadhyay, Stefan S. Nixon, Romain Watteaux, Michael McCabe +19 more

The authors demonstrate that a physics foundation model, finetuned on simulation data, can successfully predict complex laboratory fluid dynamics, specifically resolving a long-standing discrepancy in…

View →
cs.CVRecentJun 2, 2026

NewtPhys: Do Foundation Models Understand Newtonian Physics?

Sebastian Cavada, Soumava Paul, Tuan-Hung Vu, Andrei Bursuc +1 more

The paper introduces NewtPhys, a novel 4D dataset of real-world scenes with dense physical annotations, to systematically evaluate and reveal the limitations of foundation models in low-level Newtonia…

View →
cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…

View →
cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks, suggesting that their apparent skill may be due to…

View →
cs.AIastro-ph.COcs.HCRecentMay 28, 2026

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Nhat-Minh Nguyen

This case study demonstrates that in complex scientific software development, human domain expertise and careful supervision are more critical to ensuring the trustworthiness of AI-generated code than…

View →
cs.AIRecentMay 31, 2026

The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

View →
cs.LGcs.AIRecentMay 27, 2026

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis, Tiantian Feng +1 more

The paper proposes a multi-dimensional evaluation framework to assess EEG foundation models under realistic low-resource conditions, finding that while these models excel in long-context tasks, their…

View →
cs.AIcs.LGRecentMay 27, 2026

Adaptive Reservoir Computing for Multi-Scenario Chaotic System Forecasting

Shadmehr Zaregarizi, Khashayar Yavari

The paper introduces an adaptive reservoir computing framework that tailors Echo State Networks (ESNs) to specific evaluation scenarios, achieving a high score on the CTF-4-Science Lorenz benchmark fo…

View →
eess.SYcs.LGRecentJun 1, 2026

Physics-Guided Recurrent State-Space Neural Networks for Multi-Step Prediction

Ruiyuan Li, Ajay Seth, Manon Kok

The paper proposes PG-RSSNN, a physics-guided recurrent state-space neural network that improves multi-step prediction stability and accuracy compared to both pure black-box and pure physics models, e…

View →
cs.AIcs.LGRecentMay 28, 2026

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Yang Zhang, Xiukun Wei, Xueru Zhang

This paper analyzes multi-model self-consuming training, showing that while human curation helps individual models, cross-model interactions can degrade long-term alignment by dampening or inverting t…

View →
cs.AIRecentMay 27, 2026

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

Sara Metcalf, William Schoenberg

The BEAMS initiative establishes comprehensive benchmarks and evaluates AI tools for modeling and simulation, finding that current AI tools excel at qualitative discussion tasks but struggle with comp…

View →
cs.LGcs.AIphysics.comp-phRecentMay 27, 2026

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang +6 more

This paper analyzes the multi-regime behavior of Scientific Machine Learning (SciML) models, finding that optimization effectiveness is regime-specific and that failure modes require a unified, regime…

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.AIRecentMay 28, 2026

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more

The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…

View →
cs.CLRecentMay 29, 2026

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui +2 more

The paper introduces MASA, a model-aware skill alignment framework that adaptively rewrites general and task-specific skills for LLM agents, achieving superior performance across diverse backbones and…

View →