Variational Adapter for Cross-modal Similarity Representation | ArxivCSExplorer