Papers similar to 2605.30151

~ similar to 2605.30151· 20 results

cs.AIRecentMay 27, 2026

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

David Gibson, M. Elizabeth Azukas, Gerald Knezek

This study surveyed higher education practitioners to map their beliefs and behaviors regarding AI integration, finding that while they view AI favorably, institutional barriers and gaps in design-ori…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

cs.AIcs.CLcs.HCRecentMay 28, 2026

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

Junsoo Park, Youssef Medhat, Htet Phyo Wai, Ploy Thajchayapong +1 more

The paper proposes an interpretable, AI-driven decision layer that ranks course topics needing attention using multiple student and teacher signals, successfully identifying learning gaps before forma…

View →

cs.CLRecentMay 30, 2026

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

F. Carichon, S. Sharma, M. Girard, R. Rampa +1 more

The paper introduces IDEAFix, a systematic evaluation framework designed to analyze how structured prompting and task design influence the divergent thinking and originality of idea generation in LLMs…

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

Xiaonan Xu, Wenjing Wu

The study found that providing skills to LLM agents significantly boosts task success, but the specific granularity of how those skills are presented (e.g., low vs. high abstraction) has only small, u…

View →

cs.CVcs.AIcs.CLRecentMay 29, 2026

Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

Junling Wang, Boqi Chen, Heejin Do, Mubashara Akhtar +2 more

The paper introduces a new benchmark, E2V-Bench, to evaluate text-to-image models on generating pedagogically accurate visuals from arithmetic equations, finding that current models often fail due to…

View →

cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →

cs.CRcs.AIRecentJun 2, 2026

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Hang Li, Fedor Filippov, Yuling Lin, Pengfei He +5 more

This paper investigates the vulnerability of LLM-based automatic grading systems to prompt injection (PI) attacks, demonstrating that current systems are highly susceptible to manipulation that can le…

View →

cs.CVcs.AIRecentJun 1, 2026

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang +5 more

The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little co…

View →

cs.CVRecentJun 1, 2026

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao +3 more

The paper proposes using Vision-Language Models (VLMs) as 'teachers' to guide Video Generation Models (VGMs) during test-time optimization, significantly improving video reasoning capabilities.

View →

cs.CRcs.AIRecentApr 6, 2026

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more

The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…

View →

cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →

cs.CLcs.LGRecentMay 29, 2026

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain +2 more

The paper formalizes LLM degradation during long generation as 'cognitive fatigue' and introduces the Fatigue Index (FI), a measurable, model-agnostic diagnostic tool for real-time monitoring.

View →

cs.LGcs.AIRecentMay 27, 2026

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

Nishal Thomas, Noel Thomas

The paper introduces FormInv, a measurement protocol that reveals significant semantic inconsistencies in existing mathematical reasoning benchmarks, showing that standard accuracy metrics fail to cap…

View →

cs.AIRecentMay 27, 2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more

The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…

View →

cs.CYcs.AIRecentMay 31, 2026

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Mohammad Amanlou, Yasaman Amou-Jafari, Mehrad Livian, Fatemeh Boloukazari +2 more

This study compares different levels of LLM access in a statistics course, finding that structured, guided use significantly improves students' reasoning skills and independent learning compared to un…

View →

cs.AIRecentMay 31, 2026

The Case for Model Science: Verify, Explore, Steer, Refine

Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more

The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.

View →

cs.AIRecentMay 31, 2026

Advanced Mathematics Learning Behavior Prediction and Academic Early Warning Model Based on Multimodal Data Analysis

Liu Qiong, Li Zhengbo

The paper proposes a multimodal data analytics framework combining knowledge graphs and temporal modeling to accurately predict advanced mathematics learning difficulties and provide early academic wa…

View →

cs.AIcs.CLcs.LORecentMay 27, 2026

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

Pauline Bourigault, Xiaotong Ji, Matthieu Zimmer, Rasul Tutunov +1 more

The paper introduces COVCAL, a risk-controlled method that precisely determines when a partial formalization signal from an autoformalizer can be trusted to certify the correctness of natural-language…

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…

View →

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

The Case for Model Science: Verify, Explore, Steer, Refine

Advanced Mathematics Learning Behavior Prediction and Academic Early Warning Model Based on Multimodal Data Analysis

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study

Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

The Case for Model Science: Verify, Explore, Steer, Refine

Advanced Mathematics Learning Behavior Prediction and Academic Early Warning Model Based on Multimodal Data Analysis

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

"Important You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems