Francis Ferraro
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
DecomposeRL introduces an accurate and traceable claim verification model by framing the decomposition process as an RL policy, achieving state-of-the-art performance with significantly fewer training resources.
The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergences, significantly boosting model performance.
Papers
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…