Papers similar to 2605.31287

~ similar to 2605.31287· 20 results

cs.CYcs.AIcs.HCRecentMay 29, 2026

Comparing LLM-Based Conversational and Graphical Interfaces for Industrial Decision Tasks: An Exploratory Mixed-Methods Study

Roberto Figliè, Simone Caputo, Alan Serrano, Tommaso Turchi +1 more

This study compared LLM-based conversational interfaces and traditional dashboards for industrial decision tasks, finding that while conversational agents reduce interactional effort, dashboards remai…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

stat.OTcs.AIEmpiricalRecentJun 9, 2026

Flaws in the LLM Automation Narrative

George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.

View →

stat.OTcs.AIEmpiricalRecentJun 9, 2026

Flaws in the LLM Automation Narrative

George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

This paper evaluates the performance of a Large Language Model (LLM) in a high-stakes context by comparing it to human experts and measuring variance and error magnitude.

View →

cs.CLcs.AIRecentMay 28, 2026

Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment

Ruoxi Su, Yuhan Liu, Jingyu Hu

The paper introduces an adaptive interview framework to gather rich persona context, demonstrating that LLMs improve decision alignment in moral dilemmas only when they selectively ground their decisi…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →

cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →

cs.AIcs.CERecentJun 1, 2026

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita +3 more

The paper proposes the Interaction-Native Knowledge Harness (InKH), an architecture that absorbs complex context into financial LLM agents, significantly improving performance, reducing latency, and e…

View →

cs.CLRecentMay 28, 2026

Can LLM Teams Play What? Where? When?

Anastasia Kotelnikova, Viktor Byzov, Maria Dolzhenkova, Evgeny Kotelnikov

This paper investigates if team-based interaction improves LLM performance on complex reasoning tasks (ChGK), finding that structured team strategies significantly boost accuracy by acting as error-fi…

View →

cs.CLcs.AIRecentMay 31, 2026

CA-BED: Conversation-Aware Bayesian Experimental Design

Daniel Arnould, Rashad Aziz, Zixuan Kang, Tanav Changal +4 more

CA-BED is a novel framework that improves LLM performance in interactive question-answering by integrating Bayesian Experimental Design to strategically select questions that maximize information gain…

View →

cs.AIRecentMay 27, 2026

FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research

Di Zhu, Lei Nico Zheng, Zihan Chen

FundaPod is a multi-persona agent platform designed for fundamental investment research, enabling AI agents with distinct viewpoints to independently gather evidence and surface disagreements for huma…

View →

cs.HCcs.AIcs.CLRecentMay 28, 2026

Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

Drishti Goel, Agam Goyal, Veda Duddu, Olivia Pal +7 more

This study demonstrates that an LLM's assigned support role (e.g., Inform, Coach, Relate) significantly alters its safety profile and the types of risks it presents when assisting users in complex car…

View →

cs.AIRecentMay 27, 2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…

View →

cs.AIcs.CLcs.HCRecentMay 27, 2026

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

Maharshi Gor, Yoo Yeon Sung, Yu Hou, Eve Fleisig +3 more

This study investigates human-AI collaboration in question answering, finding that while collaboration is beneficial, humans make suboptimal decisions by both under-relying on correct AI suggestions a…

View →

cs.CLRecentMay 30, 2026

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs

F. Carichon, S. Sharma, M. Girard, R. Rampa +1 more

The paper introduces IDEAFix, a systematic evaluation framework designed to analyze how structured prompting and task design influence the divergent thinking and originality of idea generation in LLMs…

View →

cs.AIRecentMay 28, 2026

Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models

Arturo Valdivia, Paolo Burelli

This paper proposes a multi-agent framework using LLMs to improve collaborative story generation, demonstrating that an iterative Writer-Editor process significantly enhances narrative quality for you…

View →

cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →

cs.AIRecentMay 31, 2026

Large Language Models in Transportation Systems Management and Operations: From Text Reasoning to Multi-modal Decision Support

Siyan Li, Zehao Wang, Jiachen Li, Kanok Boriboonsomsin +2 more

This survey reviews how Large and Multi-modal Language Models (LLMs/MM-LLMs) are being applied to integrate diverse data sources for enhanced decision support in transportation systems management and…

View →

cs.AIRecentMay 27, 2026

An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

Luis Miguel Vieira da Silva, Nicolas König, Felix Gehlhoff

The paper proposes a hybrid LLM-based assistance system that enhances traditional capability-based planning by providing natural language interaction, interpretability, and flexible knowledge model ad…

View →