~ similar to 2605.31446· 20 results
DiffuSent proposes a non-auto-regressive diffusion framework to unify Aspect-Based Sentiment Analysis (ABSA), significantly improving boundary detection for multi-word aspect and opinion terms.
The paper proposes an Interpretive Audit Pipeline to evaluate LLMs for public comment analysis, arguing that measuring inter-model disagreement is crucial because standard accuracy metrics fail to det…
Yutong Cheng, Changze Li, Raihan Sultan Pasha Basuki, Qian Cui +2 more
TTPrint proposes a novel diverge-then-converge framework for extracting MITRE ATT&CK techniques from CTI reports, significantly improving both recall and precision compared to existing methods.
AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more
The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper proposes a novel, efficient method for checking the factuality of claims generated by LLMs by framing it as a true/false reading comprehension task and incorporating explicit test-taking str…
Shuheng Cao, Ruiqi Chen, Renjie Cao, Zhenhao Zhang +2 more
The paper introduces BioConCal, a supervised scoring mechanism that evaluates biomedical NER candidates surfaced by multiple LLMs, significantly improving the quality of the candidate pool for human c…
The paper introduces CERA, a novel contrastive retrieval framework that improves RAG factuality and interpretability by using subjectivity-based hard negative selection and an auxiliary attention alig…
Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more
The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…
Cheng Meng, Wenxin Le, Xinyi Li, Qiuyun Wang +3 more
The paper proposes UniRule, a novel agentic RAG framework that unifies the detection rule generation process by mapping context and language to rules, significantly outperforming pure LLM generation.
Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank +1 more
This paper proposes a unified evaluation framework for hate speech detection that systematically assesses model performance and explainability across various label and rationale representation spaces,…
The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…
The paper introduces a robust, four-mechanism LLM pipeline that generates auditable, evidence-grounded structured trait records for hundreds of thousands of diverse species across multiple taxa.
Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang +1 more
The paper introduces CITETRACE, a large-scale dataset and evaluation framework that systematically measures structural citation failures in search-augmented LLMs, revealing a pattern called Verified M…
Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more
The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…
The paper introduces a cross-encoder re-ranker trained on attribution scores to improve the retrieval of highly relevant citation passages for legal question answering, outperforming standard semantic…
Chen He, Yuhao Wu, Lei Wang, Wenxuan Zhang +1 more
The paper identifies and demonstrates that post-conclusion continuation in answer-correct long-CoT traces is harmful during LLM fine-tuning, proposing a method to cut this continuation.
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…