The paper adapts and evaluates two machine learning models, ArchesWeather and ArchesWeatherGen, demonstrating that when forced with boundary conditions, they can produce stable, long-term climate simulations that accurately reproduce established climate patterns.
We evaluate the climate simulation capabilities of ArchesWeather and ArchesWeatherGen, two machine learning models originally trained for weather forecasting and evaluated up to a 10-day lead time. ArchesWeather is a deterministic model, while ArchesWeatherGen is a probabilistic flow-matching model leveraging ArchesWeather's forecasts, enabling ensemble-based uncertainty quantification. In this work, we adapt these models to act as forced atmospheric models by using additional conditioning on the monthly mean sea surface temperature (SST) and sea ice cover (SIC) as boundary conditions. In particular, we follow the AI Model Intercomparison Project (AIMIP) Phase 1 protocol, which, analogous to the Atmospheric Model Intercomparison Project (AMIP), proposes a standardized experimental setup to evaluate the climate skill of ML-based forced atmospheric models. We present a comprehensive evaluation of both models under these conditions, including comparison against numerical climate models, ablation studies that examine key design choices in the extension, and an analysis of forced versus unforced configurations. Despite being originally developed for weather forecasting, we demonstrate that forced configurations of ArchesWeather and ArchesWeatherGen produce stable long-term climate simulations, have a stable annual cycle, and capture the drift of many climate variables. The models faithfully reproduce ERA5's climatology, large-scale circulations and interannual variability, and they capture the tails of the distributions.
Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents
The paper introduces MASA, a model-aware skill alignment framework that adaptive…
SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment
SkillC introduces a Contrastive Skill Credit Assignment (CSCA) framework to enab…
Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Te…
This paper benchmarks five distinct uncertainty quantification methods—including…
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
The paper introduces SkillHarm, a comprehensive benchmark and automated framewor…
An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechan…
This study explores using machine learning surrogates to accelerate complex nume…
Skill-Conditioned Gated Self-Distillation for LLM Reasoning
The paper proposes Skill-Conditioned Gated Self-Distillation (SGSD), a novel fra…
You Live More Than Once: Towards Hierarchical Skill Meta-Evolving
The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solutio…
Localizing Input Uncertainty Quantification for Large Language Models via Shapley Values
The paper proposes Shapley-based input uncertainty Quantification (ShaQ), a nove…