~ similar to 2605.30015· 20 results
This paper systematically evaluates the consistency of popular causal discovery benchmarks against real-world scientific literature, revealing significant variability in their accuracy.
The paper introduces novel compatibility and incompatibility scores to evaluate collections of bivariate causal statements, providing a way to assess causal claims when ground truth is unavailable.
The paper formalizes the concept of a causal pathway for rare events, showing that testable implications can be derived solely from this pathway abstraction, simplifying complex causal modeling.
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more
This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.
The paper proposes graph-coupled causal Bayesian optimization, a method that improves efficiency by sharing information across related interventions through a shared set of causal parameters.
This paper introduces an entropy-based method to generate multiple plausible causal maps (atlases) that accurately reflect the inherent structural ambiguity in complex systems, moving beyond single, o…
Zihan Chen, Yiming Zhang, Wenxiang Geng, Zenghui Ding +1 more
The paper theoretically explains that optimizing LLMs solely on outcomes leads to brittle reasoning (Reward-Induced Manifold Collapse) by favoring low-complexity shortcuts, and proposes process-based…
Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier +6 more
The paper introduces Query2Effect, a large-scale benchmark, and a two-step framework to predict causal effect sizes from natural language queries, showing that structured representation significantly…
Suryash Yagnik, Shubham Gaur, Saksham Thakur, Vinija Jain +2 more
The paper introduces 5WBENCH, a new benchmark for causal unlearning, and proposes MAAT, a novel three-phase framework that achieves high forgetting and high retention specifically on complex 'Why'-typ…
CORE-MTL proposes a representation-centric framework that uses causal orthogonal representations to disentangle task-relevant structure from nuisance variation in multi-task learning, achieving superi…
The paper introduces causal density functions, which are local density ratios that allow for the pointwise estimation and scoring of directed causal influence by comparing interventional and observati…
The paper introduces Nested Contextual Causal Bandits (NCCBs) to model multi-timescale sequential decisions and proposes a certified policy optimization method, NCTS, that provides quantifiable risk b…
Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more
The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…
The paper introduces HF-KCU, an efficient and robust method for performing causal unlearning in federated learning by approximating influence reversal, achieving significant speedups while maintaining…
Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins +2 more
TIGER is an inference-time framework that uses graph-based evidence routing to independently assess and repair unsupported facts (hallucinations) in multimodal generation.
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.
Shuaike Li, Kai Zhang, Xianquan Wang, Jiachen Liu +1 more
The paper introduces Causal Editing (CODE), a new paradigm that improves knowledge updates in LLMs by grounding fact injection in causal narratives, drastically reducing self-refutation rates.
The paper extends modular dynamic Bayesian networks (MDBNs) to model non-Markovian queues, providing the first causal metamodeling technique for such systems with significant speedup.
The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…