~ similar to 2606.00241· 19 results
Haoji Hu, Huaqing Mao, Yijun Lin, Xiaowei Jia +3 more
The paper proposes a novel nonparametric mutual information estimator to robustly quantify dependence between heterogeneous temporal data, specifically continuous time series and discrete event sequen…
Suryash Yagnik, Shubham Gaur, Saksham Thakur, Vinija Jain +2 more
The paper introduces 5WBENCH, a new benchmark for causal unlearning, and proposes MAAT, a novel three-phase framework that achieves high forgetting and high retention specifically on complex 'Why'-typ…
The paper introduces Score Broadcast and Decorrelation (SBD), a general theoretical framework that unifies broadcast-based credit assignment across various differentiable loss functions by leveraging…
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu +2 more
The paper establishes the first theoretical framework for analyzing the learnability of Test-Time Adaptation (TTA) under non-stationary data streams by introducing Recovery Complexity, which quantifie…
Kesha Ou, Zhen Tian, Wayne Xin Zhao, Long Zhang +2 more
This paper proposes a novel framework, DS-MLP, for click-through rate prediction in online advertising and recommendation systems.
Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier +6 more
The paper introduces Query2Effect, a large-scale benchmark, and a two-step framework to predict causal effect sizes from natural language queries, showing that structured representation significantly…
Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo +2 more
The paper proposes Locality-Aware Redundancy Pruning (LoRP), a training-free method that prunes LLM layers by exploiting localized inter-layer redundancy, leading to improved efficiency while maintain…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
This paper analyzes the poor performance of Meta-learning for Training-data Selection (MTS) and proposes that increasing the batch size and incorporating informative features can significantly improve…
The paper proposes a semi-relaxed Gromov-Wasserstein objective to estimate the latent connectivity structure of large-scale networks, achieving statistically consistent and efficient recovery of the u…
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
The paper introduces trust functions to filter weak supervision labels, enabling near-lossless weak-to-strong generalization by selectively training a strong student using only the most reliable weak…
Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng +4 more
The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent R…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
Hui Yang, Daiwei He, Kevin Jiang, Taejin Park +19 more
The paper introduces a novel paradigm where a fine-tuned LLM acts as an ancillary predictor to forecast likely advertisers, significantly improving ad recommendation systems by augmenting candidate ge…
The paper introduces CalArena, a large-scale, standardized benchmark covering nearly 2000 experiments to comprehensively evaluate post-hoc calibration methods, finding that smooth calibration function…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.