Your Multimodal Speech Model Says I Have a Face for Radio | ArxivCSExplorer