~ similar to 2605.29823· 17 results
This paper introduces survey sampling techniques to estimate or minimize empirical pairwise loss functions, showing that targeting informative pairs significantly reduces computational cost while main…
The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…
The paper theoretically analyzes the properties that optimal sparse autoencoder (SAE) dictionaries must satisfy, deriving constraints that explain observed SAE behaviors like hierarchical splitting an…
Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang +6 more
The paper proposes using an auxiliary reconstruction task, specifically one that captures intra-state feature dependencies, to improve the quality of state representations learned by the encoder in ne…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The paper formalizes the problem of representation identifiability in supervised learning, showing that a representation property is identifiable if and only if it is constant across all possible fact…
The paper proposes a novel neural network compression technique that aggregates neurons with similar functional dynamics, achieving significant model size reduction while maintaining high accuracy.
The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
The paper proposes a decision-aware quadratic replacement for the ReLU activation function, enabling low-degree, calibration-lossless polynomial approximations for neural network inference under Fully…
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
The paper introduces Inconsistency-Aware Minimization (IAM), a novel training objective that uses a label-free measure called local inconsistency to improve model generalization, particularly in semi-…
Junling Wang, Boqi Chen, Heejin Do, Mubashara Akhtar +2 more
The paper introduces a new benchmark, E2V-Bench, to evaluate text-to-image models on generating pedagogically accurate visuals from arithmetic equations, finding that current models often fail due to…
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
Debopam Sanyal, Anantharaman Iyer, Alind Khare, Trisha Jain +4 more
KLAS introduces a novel framework that uses KL divergence to automatically select optimal pairs of pretrained models for stitching, significantly improving the accuracy-efficiency tradeoff of resultin…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…