ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02509· 17 results

cs.HCcs.AIRecentMay 29, 2026

UXR PoV for Neuroinclusive Emotion Regulation

Melike Akca, Mona Giff, Deniz Cetinkaya, Huseyin Dogan +1 more

This paper introduces a Generative AI-augmented UXR methodology, grounded in the UXR Point of View (PoV) Playbook, to design Neuroinclusive digital interventions for emotional regulation in adults wit…

View →
cs.AIEmpiricalRecentJun 11, 2026

Automated reproducibility assessments in the social and behavioral sciences using large language models

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger +6 more

This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.

View →
cs.CLcs.AIRecentMay 27, 2026

Let the Results Speak: A Replication-First Paradigm for LLM Behavioral Benchmarking

Yuming, Huang, Yao Liu, Lei Wang +1 more

The paper introduces a 'replication-first' paradigm for LLM behavioral benchmarking, demonstrating that this rigorous approach uncovers significant, non-obvious performance drops between successive mo…

View →
cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →
cs.AIRecentMay 27, 2026

BuddyBench: A Privacy-Constrained Multi-Task Benchmark for Pediatric Social-Communication Personalization

Jeyeon Eo, Joo Young Kim, Ran Ju, Minyoung Jung +1 more

BuddyBench introduces a novel, privacy-constrained multi-task benchmark that integrates longitudinal learning trajectories, standardized clinical assessments, and randomized trial data to advance pedi…

View →
cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →
cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →
cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →
cs.CLcs.AIRecentJun 1, 2026

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

Massimiliano Pronesti, Angelo Miculescu, Mohsin Kapdi, Paul Flanagan +7 more

AutoForest is an end-to-end system that automatically generates publication-ready forest plots directly from biomedical papers, streamlining the labor-intensive process of meta-analysis.

View →
cs.AIcs.IRRecentMay 27, 2026

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

The paper introduces an LLM-based pipeline that tags learning resources with structured competencies, achieving strong performance while providing traceable evidence and leveraging graph constraints.

View →
cs.CLRecentMay 31, 2026

Child-directed speech facilitates production, not comprehension, in BabyLMs

Bastian Bunzeck, Sina Zarrieß

The paper introduces a novel production-based evaluation showing that child-directed speech (CDS) significantly improves a BabyLM's ability to generate grammatically correct language, even if standard…

View →
cs.CLRecentJun 1, 2026

DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations

Mohit Singh Chauhan

The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…

View →
cs.LGcs.AIRecentMay 27, 2026

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

Antonia Šarčević, Nikolina Frid

This study compares multiple post-hoc explainable AI methods (e.g., DeepSHAP, GradCAM) to interpret how deep learning models use EEG data to detect Major Depressive Disorder, finding that while method…

View →
cs.CLRecentJun 1, 2026

Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…

View →
cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →
cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →
cs.CLRecentMay 29, 2026

Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit

Jiwoo Choi, Seonwoo Ahn, Tongxin Zhang, Seohyon Jung

The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…

View →