Aditi Raghunathan
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Terminal Wrench, a comprehensive dataset of 331 reward-hackable terminal-agent environments and 3,632 exploit trajectories, demonstrating that detection of reward hacking degrades significantly when reasoning traces are removed.
The paper proposes Self-Trained Verification (STV), a novel method that trains verifiers to catch self-generated errors by leveraging reference solutions, significantly boosting performance in both test-time refinement and training-time self-improvement.
Papers
Self-Trained Verification for Training- and Test-Time Self-Improvement
The paper proposes Self-Trained Verification (STV), a novel method that trains verifiers to catch self-generated errors by leveraging reference solutions, significantly boosting performance in both te…