Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection | ArxivCSExplorer