Pan Zhou
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various model sizes.
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable across various large language model architectures.
Papers
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression
Minghui Zheng, Hongxu Chen, Huimin Ren, Hongsheng Xin +7 more
HMPO introduces a single-stage, cost-effective reinforcement learning framework that achieves significant token compression of Chain-of-Thought reasoning with minimal loss of accuracy, applicable acro…