~ similar to 2606.02509· 17 results
Melike Akca, Mona Giff, Deniz Cetinkaya, Huseyin Dogan +1 more
This paper introduces a Generative AI-augmented UXR methodology, grounded in the UXR Point of View (PoV) Playbook, to design Neuroinclusive digital interventions for emotional regulation in adults wit…
This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.
The paper introduces a 'replication-first' paradigm for LLM behavioral benchmarking, demonstrating that this rigorous approach uncovers significant, non-obvious performance drops between successive mo…
This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…
Jeyeon Eo, Joo Young Kim, Ran Ju, Minyoung Jung +1 more
BuddyBench introduces a novel, privacy-constrained multi-task benchmark that integrates longitudinal learning trajectories, standardized clinical assessments, and randomized trial data to advance pedi…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
AutoForest is an end-to-end system that automatically generates publication-ready forest plots directly from biomedical papers, streamlining the labor-intensive process of meta-analysis.
The paper introduces an LLM-based pipeline that tags learning resources with structured competencies, achieving strong performance while providing traceable evidence and leveraging graph constraints.
The paper introduces a novel production-based evaluation showing that child-directed speech (CDS) significantly improves a BabyLM's ability to generate grammatically correct language, even if standard…
The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…
This study compares multiple post-hoc explainable AI methods (e.g., DeepSHAP, GradCAM) to interpret how deep learning models use EEG data to detect Major Depressive Disorder, finding that while method…
This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…
The paper audits six LLMs across four languages, finding that their gender stereotyping is significantly wider than human baselines and that cross-lingual translation fundamentally alters the nature o…