Papers similar to 2606.01224

~ similar to 2606.01224· 20 results

cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →

cs.AIcs.CLcs.HCRecentMay 28, 2026

Surfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AI

Junsoo Park, Youssef Medhat, Htet Phyo Wai, Ploy Thajchayapong +1 more

The paper proposes an interpretable, AI-driven decision layer that ranks course topics needing attention using multiple student and teacher signals, successfully identifying learning gaps before forma…

View →

cs.AIRecentMay 28, 2026

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Kun Feng, Ziwei Shan, Yuchen Fang, Yiyang Tan +5 more

KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multi…

View →

cs.CYcs.AIRecentMay 31, 2026

Beyond Access: Guided LLM Scaffolding for Independent Learning in Undergraduate Statistics

Mohammad Amanlou, Yasaman Amou-Jafari, Mehrad Livian, Fatemeh Boloukazari +2 more

This study compares different levels of LLM access in a statistics course, finding that structured, guided use significantly improves students' reasoning skills and independent learning compared to un…

View →

cs.AIRecentMay 27, 2026

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

David Gibson, M. Elizabeth Azukas, Gerald Knezek

This study surveyed higher education practitioners to map their beliefs and behaviors regarding AI integration, finding that while they view AI favorably, institutional barriers and gaps in design-ori…

View →

cs.CLRecentMay 29, 2026

TeachObs: A Human-Validated Benchmark for Multimodal Teaching Observation and Model Evaluation

Yeil Jeong, Youngjin Yoo, Seobin Sohn, Hyejin Han +3 more

The paper introduces TeachObs, a comprehensive, human-validated benchmark for multimodal teaching observation, and evaluates frontier LLMs, finding that no single model consistently outperforms others…

View →

cs.AIcs.IRRecentMay 27, 2026

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

The paper introduces an LLM-based pipeline that tags learning resources with structured competencies, achieving strong performance while providing traceable evidence and leveraging graph constraints.

View →

cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →

cs.CVRecentJun 1, 2026

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao +3 more

The paper proposes using Vision-Language Models (VLMs) as 'teachers' to guide Video Generation Models (VGMs) during test-time optimization, significantly improving video reasoning capabilities.

View →

cs.LGcs.AIRecentMay 31, 2026

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

This paper develops a unified spectral analysis framework to explain how knowledge transfer (KT) works across different machine learning regimes, such as Knowledge Distillation and Weak-to-Strong gene…

View →

cs.AIRecentMay 30, 2026

KACE: Knowledge-Adaptive Context Engineering for Mathematical Reasoning

Jayant Parashar, Suchendra M. Bhandarkar

KACE introduces a novel knowledge-adaptive context engineering framework that separates knowledge storage from usage, significantly improving mathematical reasoning accuracy on challenging benchmarks…

View →

cs.AIRecentMay 28, 2026

Double-Edged Sword or Sharp Tool? Designing and Evaluating Triadic LLM-Teacher Collaboration for K-12 Writing at Scale

Canran Wang, Yuwen Yang, Zhen Wang, Ming Ma +4 more

The paper designs and evaluates a triadic LLM-Teacher collaboration system for K-12 writing, finding that strategic labor division between the LLM and teacher effectively improves writing quality but…

View →

cs.AIcs.CRRecentMay 15, 2026

GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Liangyi Huang, Zichen Liu, Fei Shao, Shang Ma +4 more

The paper introduces GRID, an end-to-end framework that significantly improves the construction of security knowledge graphs from cyber threat intelligence by replacing unstable LLM-based supervision…

View →

cs.CLRecentJun 1, 2026

When Rating Scales Fall Short: LLM-Assisted Discovery of ADHD Signals in Turkish Teacher Narratives

Baris Karacan, Irem Aktar Songur, Ahmet Ozaslan, Elvan Iseri

This study demonstrates that analyzing open-ended teacher narratives, using LLM-assisted theme discovery, can uncover distinct behavioral signals related to ADHD that are missed by traditional, struct…

View →

cs.AIRecentMay 28, 2026

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

Ashutosh Ojha, Vinay Aggarwal, Ashutosh Srivastava, Siddharth Yedlapati +2 more

MEMENTO proposes a novel framework that treats the open web as a continuous learning signal, enabling agents to acquire task-specific expertise and reusable research strategies in low-data domains wit…

View →

cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →

stat.MLcs.CRcs.LGRecentApr 5, 2026

The Hiremath Early Detection (HED) Score: A Measure-Theoretic Evaluation Standard for Temporal Intelligence

Prakul Sunil Hiremath

The paper introduces the Hiremath Early Detection (HED) Score, a new measure-theoretic standard that accurately quantifies the time-value of early detection, significantly outperforming traditional me…

View →

cs.CRcs.AIRecentMay 18, 2026

Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

Doohee You

The paper proposes the Triple-tier Anomaly Defense (TRIAD) framework, a predictive model that treats safety verification as a dynamic trajectory problem to detect cumulative, cross-modal poisoning in…

View →

cs.CLRecentMay 28, 2026

COMPOSE: Composing Future Theorems from Citations and Formal Structure

David Busbib, Michael Werman

The paper introduces COMPOSE, a dual-graph framework that generates plausible future mathematical theorems by simultaneously conditioning a language model on both the scientific citation context and t…

View →

cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →