~ similar to 2605.29411· 20 results
Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala +3 more
The paper introduces TabPrep, a feature engineering pipeline that systematically improves performance across various tabular machine learning models by addressing structural data patterns ignored by c…
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
This paper systematically evaluates the consistency of popular causal discovery benchmarks against real-world scientific literature, revealing significant variability in their accuracy.
The paper introduces Influence-Guided Symbolic Regression (IGSR), a novel framework that uses granular influence scores to guide LLMs in efficiently searching for and discovering complex mathematical…
TabChange proposes a novel framework to generate natural and minimally altered counterfactual instances in tabular data by precisely controlling attribute modifications based on their relationship str…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Zihan Chen, Yiming Zhang, Wenxiang Geng, Zenghui Ding +1 more
The paper theoretically explains that optimizing LLMs solely on outcomes leads to brittle reasoning (Reward-Induced Manifold Collapse) by favoring low-complexity shortcuts, and proposes process-based…
This paper establishes the identifiability of latent regimes and regime-dependent causal structures in complex non-stationary time series modeled by Markov Switching Models, even with instantaneous ef…
The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…
The paper demonstrates that supervised fine-tuning significantly outperforms frontier zero-shot large language models for screen-conditioned action prediction on the PiSAR benchmark, highlighting the…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
This paper introduces BBOmix, an open-source benchmark for unsupervised representation learning on real-world biological data.
The paper introduces Geometry-Aware Tabular Diffusion (GATD), a method that enhances tabular data synthesis by explicitly incorporating pairwise geometric relationships (angles and lengths) into the d…
Zizhen Deng, Jiaru Zhang, Rui Ding, Huang Bojun +4 more
The paper proposes Test-Time Training for Supervised Causal Learning (TTT-SCL), a novel framework that dynamically generates training data aligned with specific test instances to significantly improve…
The paper proposes a Doeblin-anchored contrastive chart to learn valid Markov transition kernels by combining the target transition with a restart law, ensuring the learned object is mathematically so…
The paper introduces BlockGen, a blockwise sequence model, to investigate the performance of uniform-state versus masked diffusion models when generating sequences block-by-block, showing that the per…
The paper investigates predictive multiplicity and arbitrariness in recidivism risk assessment, finding that similarly accurate models often exhibit high predictive agreement, and proposes a simple po…
The paper proposes a novel, practical upper bound to estimate the worst-case performance of medical prediction models on the target population, even when the selection bias mechanism and target data a…
Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more
The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…
The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…