~ similar to 2606.00683· 20 results
Hao Chen, Xing Tang, Qirui Liu, Weijie Shi +5 more
The paper introduces the Data-centric Reasoning Compiler (DCRC), a novel data-driven framework that enhances financial QA systems by compiling user queries and retrieved documents into verifiable, exe…
The paper proposes InSemRAG, an enhanced RAG framework that improves retrieval accuracy and knowledge integrity by incorporating intent-aware retrieval and semantics-preserving chunking, achieving sta…
Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more
The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
SCOPE introduces a data-free self-play framework that co-evolves a task-generating Challenger and a document-answering Solver, significantly improving open-ended performance on language models without…
Wenhan Xiao, Ziwei Zhang, Chuanyue Yu, Xingcheng Fu +3 more
CRITIC-R1 introduces a structured critic framework that treats RAG critique as an explicit error diagnosis problem using reinforcement learning, significantly improving answer quality over strong RAG…
The paper systematically evaluates advanced retrieval-augmented generation (RAG) architectures for Cyber Threat Intelligence (CTI), demonstrating that a hybrid graph-text approach significantly improv…
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more
The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.
This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.
Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang +9 more
The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) h…
The paper evaluates four RAG architectures under knowledge base poisoning, demonstrating that advanced architectures significantly improve robustness against adversarial contradictions, localizing the…
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…
Zelin Guan, Shengda Zhuo, Zeyan Li, Jinchun He +3 more
E-MIA introduces a novel, stealthy black-box membership inference attack that converts verifiable hard evidence within a candidate document into an objective, multi-part exam score to determine if the…