~ similar to 2605.28360· 20 results
Wenhang Shi, Yiren Chen, Shuqing Bian, Zhe Zhao +4 more
The paper introduces State-Adaptive Prompt Optimization (SAPO), a novel training strategy that treats prompts as dynamic variables to achieve robust fine-tuning, significantly mitigating catastrophic…
Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more
The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…
The paper introduces Prompted Policy Optimization (PromptPO), an LLM-based method that successfully optimizes policies for various sequential RL tasks, demonstrating that LLMs can replace classical RL…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.
Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more
The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…
Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy +5 more
The paper introduces PithTrain, a compact, agent-native Mixture-of-Experts (MoE) training framework that significantly improves agent-task efficiency compared to existing production stacks.
Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin +7 more
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable acro…
Xucong Wang, Ziyu Ma, Yong Wang, Yuxiang Ji +4 more
This paper proposes a new method for agentic Reinforcement Learning called Agentic Procedural Policy Optimization (APPO) that improves tool-use capabilities by assigning credit to fine-grained decisio…
The paper proposes using Maximum Independent Set (MIS) algorithms on similarity graphs to select a maximally diverse and non-redundant subset of prompts for LLM benchmarking, achieving consistent rank…
Haochen Yang, Ke Zhao, Mengyuan Ma, Xingyu Lu +2 more
OptSkills introduces an archetype-centric skill learning agent that improves the generalization of solving optimization problems from natural language by clustering problems by underlying archetypes a…
FPMoE introduces a sparse Mixture-of-Experts (MoE) architecture to improve functional code generation across multiple functional programming languages, achieving state-of-the-art performance with fewe…
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…
Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more
The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces eXTC, a novel framework that combines structured prompt optimization, knowledge distillation, and reinforcement learning to create a highly performant and fully interpretable text…