Yang Li
50 indexed papers
Research Timeline
This paper introduces the concept of Budget-Aware Agents (BAGEN), showing that current LLM agents often fail to manage resources proactively, and proposes that incorporating early stop and interval estimation significantly improves efficiency.
The paper proposes CAST, an answer-free self-distillation method that enhances Group Relative Policy Optimization (GRPO) for verifiable rewards, allowing token-level advantage signals even when all sampled trajectories are uniformly correct or incorrect.
SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code size.
PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.
This paper introduces personalized empathy, a capability for LLMs to adapt empathetic strategies based on individual user history, and proposes PereGRM, a reward modeling framework that significantly enhances this personalized empathy.
The paper proposes LaSR, a context-aware training paradigm that uses latent reasoning to significantly improve speech recognition, especially for specialized terminology, without adding latency.
The paper introduces MAAD, a multi-agent framework that autonomously transforms software requirements into comprehensive, multi-view architectural blueprints, significantly improving completeness and reducing manual validation.
SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex benchmarks.
The paper introduces APEIRIA, a neuro-symbolic 3D Multi-modal LLM that bridges the gap between interpretable symbolic reasoning and flexible, open-vocabulary 3D understanding.
This paper introduces interpretability-guided, training-free interventions that systematically improve the accuracy and controllability of latent reasoning in LLMs by leveraging structural and causal insights into continuous hidden states.
SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly improving performance on complex tasks without external skill generators.
RASER introduces a family of cheap, router-based systems that selectively decide whether to perform expensive multi-hop retrieval, significantly reducing LLM token costs while maintaining state-of-the-art performance.
The paper proposes a training-free framework, Visual Representation-Guided Video-LLM Reasoning, to perform composed video retrieval by using visual examples and text instructions, achieving strong performance on the CVPR 2026 challenge.
The paper introduces CASTER, a new human-centric task for evaluating User-Generated Content (UGC) resonance, and proposes MEDEA, an architecture that uses a Social Chain-of-Thought mechanism to simulate community reactions for quality assessment.
The paper demonstrates that explicit gender cues systematically affect LLM value trade-offs, causing decision flips that are often masked or misattributed by the models themselves.
The paper proposes a novel framework, LPCD, that uses latent causal modeling to robustly assess evolving adversarial risks in live streaming by decoupling malicious intent from superficial tactical shifts.
The paper introduces Distributed Semantic Recomposition (DSR), a novel cross-modal jailbreaking framework that bypasses existing safety filters by decomposing harmful intent into benign input components, achieving high attack success rates with low input toxicity.
NeuroArmor is a white-box runtime defense that uses prompt-specific safe variants to selectively detect and mitigate jailbreak attacks, significantly reducing attack success rates while maintaining a low false positive rate.
This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.
This paper introduces Repeated Policy Regret (RP-Regret), a novel game-theoretic metric for analyzing regret in repeated games with adaptive opponents, and proposes algorithms to minimize it.
Papers
Regret Minimization with Adaptive Opponents in Repeated Games
This paper introduces Repeated Policy Regret (RP-Regret), a novel game-theoretic metric for analyzing regret in repeated games with adaptive opponents, and proposes algorithms to minimize it.