Tong Wu
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This study uses a BERT-based LLM to analyze Discord sentiment and combines it with financial data to build a multi-modal model that significantly improves the prediction of Decentraland's MANA token price.
The paper introduces FraudBench, a multimodal benchmark designed to detect AI-generated fraudulent refund evidence, finding that current AI models struggle significantly with claim-conditioned fake-damage detection.
The paper introduces SAVE, a framework that uses on-policy feedback and the value function to self-supervise and improve reward models, significantly enhancing RLHF performance across multiple benchmarks.
Papers
The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement
Xiaobo Wang, Tong Wu, Min Tang, Jiaqi Li +2 more
The paper introduces SAVE, a framework that uses on-policy feedback and the value function to self-supervise and improve reward models, significantly enhancing RLHF performance across multiple benchma…