Ming Shi
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a queueing-theoretic framework to model dynamic cyber-attack surfaces, developing an adaptive reinforcement learning defense policy that significantly reduces active vulnerabilities and quantifies cumulative exposure risk.
SPADER is a novel reinforcement learning framework that addresses the challenges of Multi-Answer Question Answering by improving credit assignment and promoting diverse exploration during long-horizon tool use.
Papers
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
Qiming Shi, Zhaolu Kang, Yunfan Zhou, Di Weng +1 more
SPADER is a novel reinforcement learning framework that addresses the challenges of Multi-Answer Question Answering by improving credit assignment and promoting diverse exploration during long-horizon…