Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization | ArxivCSExplorer