Papers similar to 2606.04717v1

~ similar to 2606.04717v1· 20 results

cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →

cs.AIRecentMay 27, 2026

Better Accuracies, Worse Reasoning: A Step-Level Audit of Medical Chain-of-Thought Distillation

Zhaoyang Jiang, Xuanqi Peng, Fei Teng, Zhizhong Fu +4 more

The paper demonstrates that while distilling large language models for medical QA can significantly improve final answer accuracy, this gain often comes at the cost of factual accuracy and detailed re…

View →

cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →

cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →

cs.CRRecentJun 3, 2026

What Can Verifiable Decapsulation Tests Certify? Pass Bounds and Fault-Recognition Limits for FO-Based KEMs

José Luis Delgado Jiménez

The paper analyzes the security limits of verifiable decapsulation tests for Key Encapsulation Mechanisms (KEMs), establishing that the list-hit event is the primary black-box obstruction and deriving…

View →

cs.CRcs.AIcs.LGRecentMay 11, 2026

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

Phongsakon Mark Konrad, Toygar Tanyel, Serkan Ayvaz

The paper introduces Acceptance Cards, a rigorous four-diagnostic standard, to provide a comprehensive and reliable evaluation protocol for claims of safe fine-tuning defenses.

View →

cs.CLcs.AIRecentMay 27, 2026

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more

This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…

View →

cs.CRcs.SERecentApr 30, 2026

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang +2 more

The paper demonstrates that using raw source text for fine-tuning LLMs on vulnerability detection causes high false-positive rates by memorizing surface-level syntax, a problem mitigated by using Abst…

View →

cs.CRcs.AIRecentMay 13, 2026

Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more

The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…

View →

cs.CRRecentApr 10, 2026

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Wenhan Chang, Tianqing Zhu, Ping Xiong, Faqian Guan +1 more

The paper proposes Two-stage Backdoor Hijacking (TSBH) to create persistent, trigger-activated malicious behaviors by manipulating the observable Chain-of-Thought (CoT) process in Large Language Model…

View →

cs.LGcs.AIcs.CLRecentJun 3, 2026

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more

This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.

View →

cs.LGcs.AIRecentMay 31, 2026

ThinkSwitch: Context Distillation with LoRA and Weight Interpolation for Specific-Purpose Reasoning Tasks

Dhruv Saini, Rohan Pandey

ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.

View →

cs.LGcs.AIcs.CRRecentMay 3, 2026

Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance

Anamika Paul Rupa, Anietie Andy

The paper introduces Probe-Geometry Alignment (PGA), a surgical method that removes the measurable cross-sequence memorization signature from large language models without degrading their general capa…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks, suggesting that their apparent skill may be due to…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Sachin Kumar

This paper systematically diagnoses the failure modes of linear deception probes in LLMs, finding that while single-direction probes are insufficient, multi-dimensional probes can recover robust detec…

View →

cs.AIcs.CLcs.LGRecentMay 31, 2026

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Mingzhong Sun, Teresa Yeo, Armando Solar-Lezama, Tan Zhi-Xuan

This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…

View →

cs.CLRecentMay 28, 2026

Probing the Prompt KV Cache: Where It Becomes Dispensable

Vinayshekhar Bannihatti Kumar, Manoj Ghuhan Arivazhagan, Disha Makhija, Rashmi Gangadharaiah

This paper investigates the redundancy of the prompt KV cache during language model decoding, finding that the structure provided by chat templates is the primary source of redundancy, not the actual…

View →

cs.CLRecentJun 1, 2026

DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations

Mohit Singh Chauhan

The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…

View →

cs.AIcs.CLRecentMay 27, 2026

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

Dominika Agnieszka Długosz, Arlindo Oliveira, Natalia Díaz-Rodríguez

The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…

View →