Papers similar to 2605.29240

~ similar to 2605.29240· 20 results

cs.AIRecentMay 27, 2026

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

David Gibson, M. Elizabeth Azukas, Gerald Knezek

This study surveyed higher education practitioners to map their beliefs and behaviors regarding AI integration, finding that while they view AI favorably, institutional barriers and gaps in design-ori…

View →

cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →

cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →

cs.AIcs.CLcs.LGRecentMay 31, 2026

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Mingzhong Sun, Teresa Yeo, Armando Solar-Lezama, Tan Zhi-Xuan

This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…

View →

cs.CYcs.AIRecentMay 31, 2026

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Mohammad Amanlou, Yasaman Amou-Jafari, Mehrad Livian, Fatemeh Boloukazari +2 more

This study compares different levels of LLM access in a statistics course, finding that structured, guided use significantly improves students' reasoning skills and independent learning compared to un…

View →

cs.CLcs.CRRecentMay 9, 2026

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

Jialing Gan, Junhao Dong, Songze Li

The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…

View →

cs.AIRecentMay 29, 2026

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

Swastik Roy, Rajkumar Pujari, Tharindu Kumarage, Charith Peris +4 more

PReMISE introduces a framework to audit and improve the quality of rubrics used to guide LLM judges, demonstrating that it can significantly increase judge accuracy and reduce the exploitability of re…

View →

cs.CRcs.AIRecentJun 2, 2026

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He +5 more

This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can le…

View →

cs.HCcs.AIRecentMay 28, 2026

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

Mahjabin Nahar, Nafis Irtiza Tripto, Aiping Xiong, Ting-Hao `Kenneth' Huang +1 more

The study found that human judgment of logical fallacies is significantly biased by source labels (e.g., human vs. AI), while LLM evaluations remained comparatively stable across these source conditio…

View →

cs.AIcs.CRRecentMay 30, 2026

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more

The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide their internal reasoning steps from users, useful reasoning supervision can still be elicit…

View →

cs.AIcs.CRRecentMay 30, 2026

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more

The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide internal reasoning traces from users, useful reasoning supervision can still be elicited th…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

cs.AIRecentMay 28, 2026

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Yundong Kim, Heyoung Yang

The paper introduces TRACE, a novel metric that evaluates the logical structure of LLM reasoning (CoT) by integrating Toulmin's argumentation theory, demonstrating that sound reasoning structure corre…

View →

cs.AIcs.CLcs.HCRecentMay 27, 2026

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig +3 more

This study investigates human-AI collaboration in question answering, finding that while collaboration is beneficial, humans make suboptimal decisions by both under-relying on correct AI suggestions a…

View →

cs.AIcs.MARecentMay 28, 2026

AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

Yulei Ye, Wenhao Li, Zhong Wen, Yunshu Huang +22 more

The paper introduces AgentSchool, an advanced LLM-powered multi-agent simulator that models learning as state transitions to provide a robust, ethically viable testbed for educational research and ped…

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →

cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan, Rema Padman

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →

cs.AIRecentMay 28, 2026

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Ashutosh Ojha, Vinay Aggarwal, Ashutosh Srivastava, Siddharth Yedlapati +2 more

MEMENTO proposes a novel framework that treats the open web as a continuous learning signal, enabling agents to acquire task-specific expertise and reusable research strategies in low-data domains wit…

View →

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

AgentSchool: An LLM-Powered Multi-Agent Simulation for Education

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems