ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Understanding of the Popperian falsificationist approach”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.SEcs.CLEmpiricalRecentJun 4, 2026

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

Mehmet Iscan

This paper investigates whether the gains from using a Popperian falsificationist prompt skill in large language models are due to the skill's content or its structure.

View →
cs.AIRecentMay 29, 2026

Formalizing and falsifying causal pathways of rare events

Anahita Haghighat, Dominik Janzing

The paper formalizes the concept of a causal pathway for rare events, showing that testable implications can be derived solely from this pathway abstraction, simplifying complex causal modeling.

View →
cs.LGcs.AIcs.CRRecentMar 26, 2026

Why Safety Probes Catch Liars But Miss Fanatics

Kristiyan Haralambiev

The paper demonstrates that current safety probes designed to detect deceptive AI fail when the model adopts a coherent misalignment, where the model genuinely believes its harmful behavior is virtuou…

View →
cs.AIcs.CLcs.LGRecentMay 31, 2026

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Mingzhong Sun, Teresa Yeo, Armando Solar-Lezama, Tan Zhi-Xuan

This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…

View →
cs.AIcs.LGRecentMay 31, 2026

Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation

Minjing Shi, Junling Wang, Jingwei Ni, Sankalan Pal Chowdhury +1 more

The paper introduces LFTutor, an intelligent tutoring system leveraging LLMs and Socratic questioning to teach laypeople about logical fallacies, demonstrating its effectiveness in fostering critical…

View →
cs.AIRecentMay 31, 2026

Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking

Yuxi Sun, Wenbo Shang, Wei Gao, Xin Huang +1 more

The paper introduces a diagnostic testbed, PAVE, to evaluate how LLMs arbitrate between their internal knowledge and retrieved evidence during fact-checking, revealing that this arbitration is unrelia…

View →
cs.MAcs.AIcs.CLRecentMay 28, 2026

Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

Tom Pecher

This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…

View →
cs.LGcs.AIRecentMay 27, 2026

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

Nishal Thomas, Noel Thomas

The paper introduces FormInv, a measurement protocol that reveals significant semantic inconsistencies in existing mathematical reasoning benchmarks, showing that standard accuracy metrics fail to cap…

View →
cs.CYcs.AIcs.MARecentMay 28, 2026

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu, Helena Rong, Max Van Kleek

The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…

View →
cs.CRcs.AIRecentApr 22, 2026

Mythos and the Unverified Cage: Z3-Based Pre-Deployment Verification for Frontier-Model Sandbox Infrastructure

Dominik Blain

The paper introduces COBALT, a Z3 SMT-based formal verification engine, to proactively detect arithmetic vulnerabilities (CWE-190/191/195) in the critical infrastructure surrounding frontier AI models…

View →
cs.AIcs.CYq-fin.RMRecentMay 27, 2026

The Ethics of LLM Sandbox and Persona Dynamics

Tim Gebbie, Stewart Gebbie

The paper argues that LLM guardrails and persona dynamics create an unethical 'reality gap' by laundering epistemic risk onto users, advocating for task-level causal requirements over response-level m…

View →
cs.CRcs.LGcs.SERecentMay 16, 2026

The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

Aleksandr Churilov

This study re-evaluates LLM package hallucination rates on a new cohort of frontier models, finding a significant reduction in overall hallucination rates but identifying a persistent, model-agnostic…

View →
cs.AIRecentMay 27, 2026

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Pin Qian, Su Wang, Xiaoyuan Wang, Yihang Chen +6 more

The paper introduces FORCEBENCH, a new stress test designed to evaluate whether cited sources genuinely warrant the strength of a claim, revealing that standard citation evaluation methods often fail…

View →
cs.CRRecentMay 4, 2026

PHANTOM: Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry

Abraham Itzhak Weinberg

PHANTOM is a novel framework that generates highly convincing, context-aware honeytokens by incorporating deep organizational knowledge, significantly improving their believability and detection resis…

View →
cs.LGcs.CRRecentApr 13, 2026

Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

Chenhao Fang, Jordi Mola, Mark Harman, Jason Nawrocki +9 more

The paper introduces a Hybrid Utility Minimum Bayes Risk (HUMBR) framework to significantly reduce hallucinations in high-stakes enterprise AI workflows, outperforming standard consistency methods.

View →
cs.CRRecentApr 7, 2026

SoK: Understanding Anti-Forensics Concepts and Research Practices Across Forensic Subdomains

Janine Schneider, Florian Ramming, Maximilian Eichhorn, Gaston Pugliese +8 more

This paper systematically analyzes 123 publications on anti-forensics to quantify techniques and attack vectors, identify research patterns, and propose directions for a more coherent and ethical unde…

View →
cs.CLcs.AIRecentMay 28, 2026

Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology

Francesco De Bernardis

The study demonstrates that domain adaptation primarily reshapes the linguistic explanatory framework of language models, causing shifts in cosmological stance secondarily, rather than directly modify…

View →
cs.AIRecentMay 31, 2026

The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

View →
cs.LGcs.AIRecentMay 28, 2026

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Prakhar Dixit, Sadia Kamal, Tim Oates

The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…

View →
cs.AIRecentMay 28, 2026

ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure

A. J. Lew, Y. Cao, M. J. Buehler

The paper introduces ProjectionBench, a novel benchmark that progressively discloses information to evaluate LLMs' ability to generate scientific hypotheses, demonstrating that advanced models like GP…

View →