Ling Yang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces OmniVerifier-M1, a multimodal meta-verifier that uses symbolic outputs and decoupled reinforcement learning to provide robust, fine-grained verification and error localization for large multimodal models.
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable across various large language model architectures.
Papers
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression
Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin +7 more
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable acro…