ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization | ArxivCSExplorer