Ming Li
33 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.
BORA is an offline-to-online RL framework that enhances dexterous VLA models for real-world robotics by using an action-conditioned critic and a lightweight residual adaptation mechanism to correct execution errors.
The paper introduces Compass, an expert-guided LLM agent framework that successfully extracts and integrates thousands of previously inaccessible marine lead records from vast corpora of scientific papers, creating a major new global database.
ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.
ConMoE proposes a train-free method for compressing Mixture-of-Experts (MoE) models by consolidating the large expert pool into a smaller set of reusable prototypes and deterministically remapping all original expert calls to these prototypes.
The paper proposes CAGE-CAL, a counterfactual graph calibration framework, to accurately assess the reliability and detect over-confidence in multi-agent LLM systems after agents communicate.
The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.
The paper introduces StemBind, a diagnostic benchmark that separates perception, rule induction, and answer selection in abstract visual reasoning, revealing that the primary failure point for MLLMs is the mapping of the identified rule to the correct instance.
The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent transition point where chain-of-thought reasoning surpasses tool-mediated approaches.
The paper introduces Med-HEAL, a comprehensive framework and dataset for systematically identifying and mitigating hallucinations in medical LLMs, demonstrating that a self-critique pipeline significantly improves model accuracy.
The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgetting and enhancing robustness.
AdaCodec introduces a predictive visual coding scheme for video MLLMs, significantly improving efficiency and performance by transmitting only inter-frame changes and full reference frames when necessary.
The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization.
This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction, and scalability.
The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on complex QA tasks.
The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate that physically grounded causal reasoning is necessary for reliable autonomous agents.
This paper introduces Imaginative Perception Tokens (IPT) to improve spatial reasoning in vision language models.
The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximation errors.
The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.
Papers
OneReason Technical Report
OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more
The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…