Papers similar to 2606.00875

~ similar to 2606.00875· 19 results

cs.AIRecentMay 28, 2026

Anchorless Diversification for Parallel LLM Ideation

Fares Nabil Ibrahim, Nafis Saami Azad, Raiyan Abdul Baten

The paper compares anchorless methods for diversifying LLM-generated idea pools against traditional anchor-dependent methods, finding that semantic direction stratification offers the best balance of…

View →

cs.AIcs.LGRecentMay 27, 2026

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

Guni Sharon

This paper unifies the fragmented field of Tree-of-Thoughts (ToT) reasoning by mapping LLM-based search processes onto a formal taxonomy derived from classical heuristic search theory.

View →

cs.CLRecentMay 29, 2026

EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation

Xu Li, Hanzhe Tu, Xinyi Li, Kuncheng Zhao +2 more

EvoGens is an evolution-inspired framework that treats scientific idea generation as an evolutionary search, significantly boosting the novelty and diversity of generated research ideas compared to ex…

View →

cs.CLRecentMay 31, 2026

Before and After Temperature: A Distributional View of Creative LLM Generation

V. S. Raghu Parupudi, Harsha Ponnada, Aditi Kaushal, S. Shria Parupudi +2 more

The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…

View →

cs.AIRecentMay 28, 2026

Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models

Arturo Valdivia, Paolo Burelli

This paper proposes a multi-agent framework using LLMs to improve collaborative story generation, demonstrating that an iterative Writer-Editor process significantly enhances narrative quality for you…

View →

cs.AIRecentMay 28, 2026

Temporal Stability and Few-Shot Prompting in Math Task Assessment

Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

cs.CLcs.CRRecentMay 9, 2026

BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence

Jialing Gan, Junhao Dong, Songze Li

The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…

View →

cs.AIRecentMay 28, 2026

Make LLM Learn to Synthesize from Streaming Experiences through Feedback

Zhenlin Hu, Yan Wang, Zhen Bi, Zihao Xue +6 more

The paper introduces StreamSynth, a sequential setting for synthetic data generation, and proposes SynLearner, a framework that enables LLMs to improve synthesis performance by accumulating and transf…

View →

cs.AIRecentMay 27, 2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more

The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…

View →

cs.CRcs.AIRecentApr 6, 2026

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more

The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…

View →

cs.IRcs.AIcs.CYRecentMay 27, 2026

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Annabella Sánchez-Guzmán, Lukas Eberhard, Denis Helic, Lisette Espín-Noboa

The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…

View →

cs.CRcs.LGRecentMay 24, 2026

Memory-Induced Tool-Drift in LLM Agents

Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia

The paper identifies 'memory-induced tool-drift,' a systematic vulnerability where personality biases stored in an LLM agent's memory silently corrupt tool-calling decisions, even when those biases ar…

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.CLEmpiricalRecentJun 4, 2026

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin +4 more

This paper investigates whether adults' struggles with conjunctive causal rules persist when they have agency through active exploration.

View →

cs.AIcs.LGRecentMay 27, 2026

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa +1 more

This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…

View →

cs.SEcs.AIcs.LGRecentMay 29, 2026

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

Nazmus Ashrafi

The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…

View →

cs.CLRecentJun 1, 2026

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs

Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao +4 more

The paper introduces CultureForest, a new benchmark for evaluating Cultural Norm Grounded Reasoning in LLMs, demonstrating that models struggle to apply their cultural knowledge effectively in realist…

View →