~ similar to 2605.29153· 19 results
The paper introduces a comprehensive benchmark to test if physics foundation models learn generalizable dynamics, finding that their performance is highly conditional and not universally general.
The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…
Xiangtao Meng, Wenyu Chen, Chuanchao Zang, Xinyu Gao +4 more
This paper systematically measures and explains how sequential model defenses can conflict, finding that 38.9% of ordered defense sequences cause measurable risk exacerbation due to anti-aligned param…
The paper proposes a novel information-geometric framework to analyze LLM stability by integrating task utility, external entropy, and internal structural proxies, showing this composite score improve…
The scaling exponent in neural scaling laws is not fixed but systematically depends on the optimizer used, with preconditioned optimizers generally yielding steeper scaling.
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more
The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.
The paper proposes evaluating certified training methods by comparing their Pareto fronts across the natural-certified accuracy trade-off, revealing superior performance and previously unappreciated c…
The paper introduces a unified Physics-Informed Deep Learning (PIDL) framework that simultaneously enforces physical laws and information-theoretic bounds, demonstrating robust, domain-agnostic entrop…
Arnaud Descours, Arnaud Guillin, Geoffrey Lacour, Manon Michel +2 more
This paper develops a novel, computationally efficient method to quantify the uncertainty in wide neural network predictions by characterizing the limiting random fluctuations using stochastic evoluti…
The paper challenges the assumption that LLM safety is a binary threshold, proposing that safety failures occur in an 'instability region' and introducing Furina, a transferable attack that exploits t…
The paper introduces Multi-Response Training (MRT) to combat the 'mode lottery' problem in language model fine-tuning, showing that retaining multiple valid responses significantly improves distributi…
The paper introduces an adaptive reservoir computing framework that tailors Echo State Networks (ESNs) to specific evaluation scenarios, achieving a high score on the CTF-4-Science Lorenz benchmark fo…
The paper proposes a local perturbation theory showing that cross-domain interference in multi-domain RL occurs via a low-dimensional shared conflict subspace, which can be selectively mitigated by sh…
PARD-SSM is a probabilistic framework that models network traffic as a switching state-space system to detect multi-stage cyber-attacks in real-time with high accuracy and predictive capability.
This paper analyzes multi-model self-consuming training, showing that while human curation helps individual models, cross-model interactions can degrade long-term alignment by dampening or inverting t…
The paper introduces hybrid neural world models that provide fast, multi-horizon predictions for complex physical dynamics, implicitly handling sharp events like shocks and contacts without explicit t…
The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…
Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain +2 more
The paper formalizes LLM degradation during long generation as 'cognitive fatigue' and introduces the Fatigue Index (FI), a measurable, model-agnostic diagnostic tool for real-time monitoring.