Papers similar to 2605.29387

~ similar to 2605.29387· 17 results

cs.LGcs.CLcs.CVRecentJun 2, 2026

Neuron Populations Exhibit Divergent Selectivity with Scale

Amil Dravid, Yasaman Bahri, Alexei A. Efros, Yossi Gandelsman

The study finds that specific, interpretable neuron populations (Rosetta Neurons) exhibit predictable, scale-dependent changes in selectivity and specialization as neural models grow larger.

View →

cs.LGcs.AIcs.CVRecentMay 28, 2026

How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions

Jeff A. Bilmes, Gantavya Bhatt, Arnav M. Das

The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…

View →

cs.LGcs.AIRecentMay 31, 2026

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

This paper develops a unified spectral analysis framework to explain how knowledge transfer (KT) works across different machine learning regimes, such as Knowledge Distillation and Weak-to-Strong gene…

View →

cs.LGcs.AIRecentMay 29, 2026

Rethinking the Role of Temperature in Large Language Model Distillation

Hoang-Chau Luong, Lingwei Chen

This paper re-examines the role of temperature ($ au$) in LLM distillation, demonstrating that while Reverse KL (RKL) is often preferred, Forward KL (FKL) significantly outperforms RKL at higher tempe…

View →

cs.CLRecentMay 31, 2026

Before and After Temperature: A Distributional View of Creative LLM Generation

V. S. Raghu Parupudi, Harsha Ponnada, Aditi Kaushal, S. Shria Parupudi +2 more

The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…

View →

cs.LGcs.AIphysics.comp-phRecentMay 27, 2026

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang +6 more

This paper analyzes the multi-regime behavior of Scientific Machine Learning (SciML) models, finding that optimization effectiveness is regime-specific and that failure modes require a unified, regime…

View →

cs.LGcs.AIEmpiricalRecentJun 4, 2026

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more

This paper proposes a preconditioning layer for stable weight conditioning in LLM training.

View →

cs.LGcs.AIEmpiricalRecentJun 4, 2026

PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training

Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more

This paper proposes a preconditioning layer for stable weight conditioning in LLM training.

View →

math.NAcs.LGRecentJun 1, 2026

Spectral Audit of In-Context Operator Networks

Zhiwei Gao, Liu Yang, George Em Karniadakis

The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…

View →

cs.LGcs.CLRecentMay 30, 2026

Task Structure Reverses Layerwise State Encoding in Sequence Models

Yuhang Jiang

The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…

View →

cs.LGcs.AIRecentMay 31, 2026

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more

This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…

View →

cs.AIcs.CLRecentMay 27, 2026

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

Dominika Agnieszka Długosz, Arlindo Oliveira, Natalia Díaz-Rodríguez

The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…

View →

cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference

Sourav Das

ProbScale is a novel framework that combines neural scaling laws and language model probing to identify highly efficient, task-specific subnetworks within pre-trained Small Language Models, achieving…

View →

cs.LGcs.AIRecentMay 28, 2026

Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts

Mengdi Chu, Yang Liu, Ayan Biswas, Han-Wei Shen

The paper introduces a comprehensive benchmark to test if physics foundation models learn generalizable dynamics, finding that their performance is highly conditional and not universally general.

View →

cs.LGcs.AImath.OCRecentMay 28, 2026

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang

The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…

View →

cs.LGcs.AIcs.CVRecentMay 30, 2026

On the Difficulty of Learning a Meta-network for Training Data Selection

Zilin Du, Junqi Zhao, Boyang Albert Li

This paper analyzes the poor performance of Meta-learning for Training-data Selection (MTS) and proposes that increasing the batch size and incorporating informative features can significantly improve…

View →