Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Ming Li

Ming Li

33 indexed papers

Recent (6 mo)
33
With code
0
Influential cites
0
Benchmarked
0

Publications per year

33
26

Top categories

AI×22Crypto×12NLP×7ML×6Vision×5Multimedia×2Info Retrieval×1Stats ML×1

Frequent co-authors

Yiming Liu4×
Ming Liu4×
Tong Yang4×
Yiming Li4×
Ziming Li3×
Yaoming Li3×

Research Timeline

2026
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

Reinforcement Learning with Robust Rubric Rewards

The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robust rubric scoring.

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

BORA is an offline-to-online RL framework that enhances dexterous VLA models for real-world robotics by using an action-conditioned critic and a lightweight residual adaptation mechanism to correct execution errors.

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

The paper introduces Compass, an expert-guided LLM agent framework that successfully extracts and integrates thousands of previously inaccessible marine lead records from vast corpora of scientific papers, creating a major new global database.

ESPO: Early-Stopping Proximal Policy Optimization

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

ConMoE proposes a train-free method for compressing Mixture-of-Experts (MoE) models by consolidating the large expert pool into a smaller set of reusable prototypes and deterministically remapping all original expert calls to these prototypes.

Counterfactual Graph for Multi-Agent LLM Calibration

The paper proposes CAGE-CAL, a counterfactual graph calibration framework, to accurately assess the reliability and detect over-confidence in multi-agent LLM systems after agents communicate.

Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing

The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.

StemBind: When MLLMs Get Lost Between Rules and Instances in Abstract Visual Reasoning

The paper introduces StemBind, a diagnostic benchmark that separates perception, rule induction, and answer selection in abstract visual reasoning, revealing that the primary failure point for MLLMs is the mapping of the identified rule to the correct instance.

ProtStructQA: A Denotation Threshold in Protein Structural Reasoning

The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent transition point where chain-of-thought reasoning surpasses tool-mediated approaches.

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

The paper introduces Med-HEAL, a comprehensive framework and dataset for systematically identifying and mitigating hallucinations in medical LLMs, demonstrating that a self-critique pipeline significantly improves model accuracy.

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgetting and enhancing robustness.

AdaCodec: A Predictive Visual Code for Video MLLMs

AdaCodec introduces a predictive visual coding scheme for video MLLMs, significantly improving efficiency and performance by transmitting only inter-frame changes and full reference frames when necessary.

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization.

A Primer in Post-Training Reasoning Data: What We Know About How It Works

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction, and scalability.

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on complex QA tasks.

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate that physically grounded causal reasoning is necessary for reliable autonomous agents.

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

This paper introduces Imaginative Perception Tokens (IPT) to improve spatial reasoning in vision language models.

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximation errors.

OneReason Technical Report

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coherent latent interests.

Highlighted terms show continued research focus across papers

Papers

cs.IRcs.AIcs.CLRecentJun 4, 2026

OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding, Chenglong Chu +80 more

The paper proposes OneReason, a framework that enhances the reasoning capability of generative recommendation models by focusing on improving item perception and structuring user behavior into coheren…

View →
cs.AIRecentJun 2, 2026

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Mahtab Bigverdi, Lindsey Li, Weikai Huang, Yiming Liu +7 more

This paper introduces Imaginative Perception Tokens (IPT) to improve spatial reasoning in vision language models.

View →
stat.MLcs.LGRecentJun 2, 2026

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…

View →
cs.LGcs.AIRecentJun 1, 2026

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang +6 more

The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgett…

View →
cs.CVcs.AIcs.CLRecentJun 1, 2026

AdaCodec: A Predictive Visual Code for Video MLLMs

Haowen Hou, Zhen Huang, Zheming Liang, Qingyi Si +7 more

AdaCodec introduces a predictive visual coding scheme for video MLLMs, significantly improving efficiency and performance by transmitting only inter-frame changes and full reference frames when necess…

View →
cs.CVcs.AIRecentJun 1, 2026

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

Yiming Wang, Baiqi Wu, Qingming Li, Jiahao Chen +2 more

The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization…

View →
cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →
cs.AIRecentJun 1, 2026

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

Bin Chen, Xinye Liao, Yiming Liu, Xin Liao +1 more

The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on…

View →
cs.AIRecentJun 1, 2026

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

Zheng Lu, Mingqi Gao, Qinlei Xie, Wanqi Zhong +7 more

The paper argues that current embodied planning benchmarks prioritize superficial language prediction over true physical reasoning, introducing new benchmarks and a large-scale dataset to demonstrate…

View →
cs.CLRecentMay 31, 2026

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

Yiming Liao, Zeno Franco, Jose Eduardo Lizarraga Mazaba, Keke Chen

The paper introduces Med-HEAL, a comprehensive framework and dataset for systematically identifying and mitigating hallucinations in medical LLMs, demonstrating that a self-critique pipeline significa…

View →
cs.CLRecentMay 30, 2026

ProtStructQA: A Denotation Threshold in Protein Structural Reasoning

Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more

The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…

View →
cs.CVcs.AIRecentMay 29, 2026

StemBind: When MLLMs Get Lost Between Rules and Instances in Abstract Visual Reasoning

Xixiang He, Baiqi Wu, Xingming Li, Ao Cheng +3 more

The paper introduces StemBind, a diagnostic benchmark that separates perception, rule induction, and answer selection in abstract visual reasoning, revealing that the primary failure point for MLLMs i…

View →
cs.CVcs.AIRecentMay 28, 2026

Reinforcement Learning with Robust Rubric Rewards

Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu +14 more

The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robu…

View →
cs.ROcs.AIRecentMay 28, 2026

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Zhongxi Chen, Yifan Han, Yanming Shao, Huanming Liu +4 more

BORA is an offline-to-online RL framework that enhances dexterous VLA models for real-world robotics by using an action-conditioned critic and a lightweight residual adaptation mechanism to correct ex…

View →
cs.AIRecentMay 28, 2026

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

Yiming Liu, Bin Lu, Meng Jin, Ziyuan Sang +5 more

The paper introduces Compass, an expert-guided LLM agent framework that successfully extracts and integrates thousands of previously inaccessible marine lead records from vast corpora of scientific pa…

View →
cs.LGcs.AIRecentMay 28, 2026

ESPO: Early-Stopping Proximal Policy Optimization

Zihang Li, Rui Zhou, Yingcheng Shi, Wenhan Yu +7 more

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning b…

View →
cs.AIRecentMay 28, 2026

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong +2 more

ConMoE proposes a train-free method for compressing Mixture-of-Experts (MoE) models by consolidating the large expert pool into a smaller set of reusable prototypes and deterministically remapping all…

View →
cs.CLRecentMay 28, 2026

Counterfactual Graph for Multi-Agent LLM Calibration

Jiatan Huang, Mingchen Li, Ziming Li, Sunjae Kwon +2 more

The paper proposes CAGE-CAL, a counterfactual graph calibration framework, to accurately assess the reliability and detect over-confidence in multi-agent LLM systems after agents communicate.

View →
cs.CRcs.CVcs.GRRecentMay 28, 2026

Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing

Leyi Qi, Yiming Li, Siyuan Liang, Zhengzhong Tu +1 more

The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.

View →
cs.AIRecentMay 27, 2026

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang +4 more

The paper argues that current search agents often verify existing knowledge rather than genuinely searching, and introduces LiveBrowseComp, a new benchmark to measure true evidence-driven discovery.

View →