~ similar to 2606.00875· 19 results
The paper compares anchorless methods for diversifying LLM-generated idea pools against traditional anchor-dependent methods, finding that semantic direction stratification offers the best balance of…
This paper unifies the fragmented field of Tree-of-Thoughts (ToT) reasoning by mapping LLM-based search processes onto a formal taxonomy derived from classical heuristic search theory.
Xu Li, Hanzhe Tu, Xinyi Li, Kuncheng Zhao +2 more
EvoGens is an evolution-inspired framework that treats scientific idea generation as an evolutionary search, significantly boosting the novelty and diversity of generated research ideas compared to ex…
The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…
This paper proposes a multi-agent framework using LLMs to improve collaborative story generation, demonstrating that an iterative Writer-Editor process significantly enhances narrative quality for you…
This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
Zhenlin Hu, Yan Wang, Zhen Bi, Zihao Xue +6 more
The paper introduces StreamSynth, a sequential setting for synthetic data generation, and proposes SynLearner, a framework that enables LLMs to improve synthesis performance by accumulating and transf…
Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more
The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…
Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more
The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…
The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…
The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…
The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more
This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…
Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao +4 more
The paper introduces CultureForest, a new benchmark for evaluating Cultural Norm Grounded Reasoning in LLMs, demonstrating that models struggle to apply their cultural knowledge effectively in realist…