Pretraining Recurrent Networks without Recurrence | ArxivCSExplorer