ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.29411· 20 results

cs.LGRecentJun 1, 2026

TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala +3 more

The paper introduces TabPrep, a feature engineering pipeline that systematically improves performance across various tabular machine learning models by addressing structural data patterns ignored by c…

View →
cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →
cs.AIRecentJun 1, 2026

Consistency evaluation of benchmarks used for causal discovery

Yuzhe Zhang, Chihui Chen, Lina Yao, Chen Wang

This paper systematically evaluates the consistency of popular causal discovery benchmarks against real-world scientific literature, revealing significant variability in their accuracy.

View →
cs.LGcs.AIRecentMay 27, 2026

Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback

Evgeny S. Saveliev, Samuel Holt, Nabeel Seedat, David L. Bentley +2 more

The paper introduces Influence-Guided Symbolic Regression (IGSR), a novel framework that uses granular influence scores to guide LLMs in efficiently searching for and discovering complex mathematical…

View →
cs.LGcs.AIRecentMay 30, 2026

TabChange: Precise Attribute Changes in Tabular Data

Arjun Dahal, Yu Lei, Raghu N. Kacker, Richard Kuhn

TabChange proposes a novel framework to generate natural and minimally altered counterfactual instances in tabular data by precisely controlling attribute modifications based on their relationship str…

View →
cs.CLcs.IRRecentJun 3, 2026

Caliper: Probing Lexical Anchors versus Causal Structure in LLMs

Zhenyu Yu, Shuigeng Zhou

This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.

View →
cs.LGcs.AIRecentMay 30, 2026

The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs

Zihan Chen, Yiming Zhang, Wenxiang Geng, Zenghui Ding +1 more

The paper theoretically explains that optimizing LLMs solely on outcomes leads to brittle reasoning (Reward-Induced Manifold Collapse) by favoring low-complexity shortcuts, and proposes process-based…

View →
stat.MLcs.LGstat.MERecentJun 1, 2026

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

Roel Hulsman, Carles Balsells-Rodas, Sara Magliacane

This paper establishes the identifiability of latent regimes and regime-dependent causal structures in complex non-stationary time series modeled by Markov Switching Models, even with instantaneous ef…

View →
cs.CLRecentMay 29, 2026

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…

View →
cs.AIcs.CLcs.HCRecentMay 28, 2026

Architecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR Benchmark

Rahul Bissa, Abhishek Vyas, Yash Jain

The paper demonstrates that supervised fine-tuning significantly outperforms frontier zero-shot large language models for screen-conditioned action prediction on the PiSAR benchmark, highlighting the…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →
cs.LGRecentJun 3, 2026

BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning

Luca Thale-Bombien, Jan Ewald, Ralf König, Aaron Klein

This paper introduces BBOmix, an open-source benchmark for unsupervised representation learning on real-world biological data.

View →
cs.LGcs.AIcs.CRRecentMay 23, 2026

Geometry-Aware Tabular Diffusion

David Turtora Zagardo

The paper introduces Geometry-Aware Tabular Diffusion (GATD), a method that enhances tabular data synthesis by explicitly incorporating pairwise geometric relationships (angles and lengths) into the d…

View →
cs.LGcs.AIRecentMay 28, 2026

Test Time Training for Supervised Causal Learning

Zizhen Deng, Jiaru Zhang, Rui Ding, Huang Bojun +4 more

The paper proposes Test-Time Training for Supervised Causal Learning (TTT-SCL), a novel framework that dynamically generates training data aligned with specific test instances to significantly improve…

View →
cs.LGRecentJun 1, 2026

A Doeblin-Anchored Contrastive Chart for Learning Markov Transition Kernels

Ao Xu

The paper proposes a Doeblin-anchored contrastive chart to learn valid Markov transition kernels by combining the target transition with a restart law, ensuring the learned object is mathematically so…

View →
cs.LGRecentJun 1, 2026

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

Justin Deschenaux, Caglar Gulcehre

The paper introduces BlockGen, a blockwise sequence model, to investigate the performance of uniform-state versus masked diffusion models when generating sequences block-by-block, showing that the per…

View →
cs.LGcs.CYRecentJun 1, 2026

Model Multiplicity and Predictive Arbitrariness in Recidivism Risk Assessment

Ashwin Singh, Carlos Castillo

The paper investigates predictive multiplicity and arbitrariness in recidivism risk assessment, finding that similarly accurate models often exhibit high predictive agreement, and proposes a simple po…

View →
cs.LGcs.AIstat.MLRecentMay 30, 2026

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

Kara Liu, Maggie Wang, Russ B. Altman

The paper proposes a novel, practical upper bound to estimate the worst-case performance of medical prediction models on the target population, even when the selection bias mechanism and target data a…

View →
cs.CLRecentMay 30, 2026

ProtStructQA: A Denotation Threshold in Protein Structural Reasoning

Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more

The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…

View →
cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…

View →