Papers similar to 2606.00250

~ similar to 2606.00250· 20 results

cs.CYcs.AIRecentMay 31, 2026

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Mohammad Amanlou, Yasaman Amou-Jafari, Mehrad Livian, Fatemeh Boloukazari +2 more

This study compares different levels of LLM access in a statistics course, finding that structured, guided use significantly improves students' reasoning skills and independent learning compared to un…

View →

cs.AIRecentMay 28, 2026

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Canran Wang, Yuwen Yang, Zhen Wang, Ming Ma +4 more

The paper designs and evaluates a triadic LLM-Teacher collaboration system for K-12 writing, finding that strategic labor division between the LLM and teacher effectively improves writing quality but…

View →

cs.IRcs.AIcs.CYRecentMay 27, 2026

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Annabella Sánchez-Guzmán, Lukas Eberhard, Denis Helic, Lisette Espín-Noboa

The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…

View →

cs.AIEmpiricalRecentJun 11, 2026

Automated reproducibility assessments in the social and behavioral sciences using large language models

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger +6 more

This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.

View →

stat.OTcs.AIEmpiricalRecentJun 9, 2026

Flaws in the LLM Automation Narrative

George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.

View →

stat.OTcs.AIEmpiricalRecentJun 9, 2026

Flaws in the LLM Automation Narrative

George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.

View →

cs.CRcs.AIRecentJun 2, 2026

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He +5 more

This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can le…

View →

cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →

cs.CRcs.AIcs.CYRecentMay 7, 2026

Detecting Verbatim LLM Copy-Paste in Homework

Aizierjiang Aiersilan

The paper proposes SteganoPrompt, an input-side watermark embedded in the assignment prompt that forces LLMs to generate a detectable signature in their output, thereby exposing verbatim copy-pasting.

View →

cs.CLRecentMay 30, 2026

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

F. Carichon, S. Sharma, M. Girard, R. Rampa +1 more

The paper introduces IDEAFix, a systematic evaluation framework designed to analyze how structured prompting and task design influence the divergent thinking and originality of idea generation in LLMs…

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…

View →

cs.AIcs.CLRecentMay 28, 2026

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more

This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…

View →

cs.AIcs.CLRecentMay 28, 2026

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing

Krzysztof Żurawicki, Julia Farganus, Arkadiusz Gaweł, Mateusz Bystroński +1 more

The paper introduces PRAIB, a benchmark that demonstrates that LLM-generated peer reviews, while often verbose, systematically diverge from human norms by being less variable, positively biased, and f…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

cs.AIcs.CLcs.HCRecentMay 27, 2026

Mind Your Tone: Does Tone Alter LLM Performance?

Om Dobariya, Akhil Kumar

This study demonstrates that the tone of a prompt significantly affects the accuracy of various LLMs, requiring users to exercise caution regarding tone-robust reliability.

View →

cs.CRRecentMay 10, 2026

Permit: Permission-Aware Representation Intervention for Controlled Generation in Large Language Models

Pengcheng Sun, Lan Zhang, Zhaopeng Zhang, Jiewei Lai +1 more

Permit is a novel framework that enforces fine-grained, permission-aware control over the hidden states of LLMs, preventing information leakage even when sensitive data is present in the context.

View →

cs.CLcs.AIRecentJun 1, 2026

Argument Collapse: LLMs Flatten Long-Form Public Debate

Yekyung Kim, Yapei Chang, Chau Minh Pham, Mohit Iyyer

The paper demonstrates 'argument collapse,' showing that LLMs tend to converge on a small, repetitive set of polished arguments when generating long-form public debates, significantly reducing the div…

View →

cs.CRcs.CLcs.LGRecentMay 28, 2026

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu +4 more

This paper introduces 'implicit identity' as a unifying framework to survey and categorize LLM fingerprinting and watermarking techniques for verifying ownership and provenance across datasets, models…

View →

cs.CLRecentMay 28, 2026

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana +1 more

The paper demonstrates that using English prompts causes large language models to prioritize globally dominant narratives over local cultural knowledge, even when local evidence is provided.

View →

cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Automated reproducibility assessments in the social and behavioral sciences using large language models

Flaws in the LLM Automation Narrative

Flaws in the LLM Automation Narrative

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Detecting Verbatim LLM Copy-Paste in Homework

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Demystifying Data Organization for Enhanced LLM Training

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Mind Your Tone: Does Tone Alter LLM Performance?

Permit: Permission-Aware Representation Intervention for Controlled Generation in Large Language Models

Argument Collapse: LLMs Flatten Long-Form Public Debate

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Automated reproducibility assessments in the social and behavioral sciences using large language models

Flaws in the LLM Automation Narrative

Flaws in the LLM Automation Narrative

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Detecting Verbatim LLM Copy-Paste in Homework

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Demystifying Data Organization for Enhanced LLM Training

PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Mind Your Tone: Does Tone Alter LLM Performance?

Permit: Permission-Aware Representation Intervention for Controlled Generation in Large Language Models

Argument Collapse: LLMs Flatten Long-Form Public Debate

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

Temporal Stability and Few-Shot Prompting in Math Task Assessment

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems