Papers similar to 2605.31370

~ similar to 2605.31370· 20 results

cs.CLcs.AIcs.CERecentMay 28, 2026

MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery

MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.

View →

cs.SEcs.AIcs.CRRecentMar 21, 2026

AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

Sen Fang, Weiyuan Ding, Zhezhen Cao, Zhou Yang +1 more

AEGIS is a novel multi-agent framework that grounds vulnerability reasoning by reconstructing per-variable dependency chains over a Code Property Graph, achieving state-of-the-art performance on the P…

View →

cs.CLcs.LGRecentMay 31, 2026

Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention

Shuochen Chang, Tong Bai, Xiaofeng Zhang, Qianli Ma +4 more

This paper introduces interpretability-guided, training-free interventions that systematically improve the accuracy and controllability of latent reasoning in LLMs by leveraging structural and causal…

View →

cs.AIRecentMay 27, 2026

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant +2 more

The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…

View →

cs.AIcs.LGRecentMay 31, 2026

Tackling the Root of Misinformation by Teaching Laypeople about Logical Fallacies via Socratic Questioning and Critical Argumentation

Minjing Shi, Junling Wang, Jingwei Ni, Sankalan Pal Chowdhury +1 more

The paper introduces LFTutor, an intelligent tutoring system leveraging LLMs and Socratic questioning to teach laypeople about logical fallacies, demonstrating its effectiveness in fostering critical…

View →

cs.AIRecentMay 27, 2026

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more

The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…

View →

cs.HCcs.AIcs.LGRecentMay 28, 2026

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

Aritra Dasgupta, Naga Datha Saikiran Battula, Avina Nakarmi, Sohom Sen +2 more

The paper introduces Rationalize, a role-pair framework that facilitates shared semantic reasoning between humans and AI models to achieve deep alignment of intent and action.

View →

cs.CRcs.AIcs.MARecentMar 23, 2026

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt +1 more

The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…

View →

cs.CLRecentMay 29, 2026

What Am I Missing? Question-Answering as Hidden State Probing

Chu Fei Luo, Samuel Dahan, Xiaodan Zhu

The paper proposes using question-asking as an inference-time intervention to probe a language model's hidden state, finding that the self-diagnosis process provides a predictive signal for final corr…

View →

cs.AIRecentJun 1, 2026

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma +1 more

BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.

View →

cs.CLRecentMay 31, 2026

Not All Explanations Simulate Equally: Comparing Verbalized Feature Attributions and Self-Generated Rationales

Pingjun Hong, Benjamin Roth

The paper compares verbalized feature attributions and self-generated rationales for explaining model behavior, finding that the format and granularity of the explanation significantly affect its abil…

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…

View →

cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan, Rema Padman

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →

cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →

cs.AIcs.CLcs.LGRecentMay 29, 2026

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more

COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.

View →

cs.CLRecentMay 29, 2026

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An +3 more

MoG proposes a novel Mixture of Experts framework for graph-based RAG, which uses hub graphs to guide the sparse activation of domain-specific expert graphs, significantly improving retrieval accuracy…

View →

cs.CLcs.AIRecentMay 28, 2026

Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies

Yuxuan Ye, Raul Santos-Rodriguez, Edwin Simpson

The paper proposes a novel, efficient method for checking the factuality of claims generated by LLMs by framing it as a true/false reading comprehension task and incorporating explicit test-taking str…

View →

cs.CRcs.AIRecentMay 10, 2026

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston +2 more

The paper introduces Oracle Poisoning, an attack that corrupts knowledge graphs used by AI agents, demonstrating that all tested models blindly trust poisoned data at high sophistication levels.

View →

cs.AIcs.CRcs.IRRecentApr 3, 2026

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li

AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.

View →

cs.CLcs.AIRecentMay 31, 2026

CA-BED: Conversation-Aware Bayesian Experimental Design

Daniel Arnould, Rashad Aziz, Zixuan Kang, Tanav Changal +4 more

CA-BED is a novel framework that improves LLM performance in interactive question-answering by integrating Bayesian Experimental Design to strategically select questions that maximize information gain…

View →