~ similar to 2606.00125· 8 results
Xiangyi Chen, Zelun Wang, Xinyi Li, Yi-Ping Hsu +2 more
The paper proposes PrefixMem, a dedicated encoder for Semantic IDs (SIDs), demonstrating that structured, prefix-conditioned representations significantly improve the accuracy and recall of generative…
Daeyong Kwon, Qiyu Wu, Shinobu Kuriya, Junghyun Koo +5 more
The paper introduces MusTBENCH, a new benchmark, and MusT, an optimization recipe, to rigorously test and improve the ability of Large Audio-Language Models (LALMs) to accurately ground their musical…
Hui Yang, Daiwei He, Kevin Jiang, Taejin Park +19 more
The paper introduces a novel paradigm where a fine-tuned LLM acts as an ancillary predictor to forecast likely advertisers, significantly improving ad recommendation systems by augmenting candidate ge…
Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang +1 more
The paper proposes FedMChain, a novel federated learning framework that structures multimodal training into sequential phases to mitigate modality competition and improve model performance while reduc…
Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos +1 more
The paper proposes an inference-time activation steering framework, utilizing orthogonalization, to achieve fine-grained, deterministic control over discrete musical attributes like Pitch and Duration…
This study demonstrates that the tone of a prompt significantly affects the accuracy of various LLMs, requiring users to exercise caution regarding tone-robust reliability.
Weizhi Zhang, Wooseong Yang, Yuxin Cui, Zhaohui Guo +8 more
The paper advocates for integrating explicit contextual feedback (like reviews and comments) into LLM-based recommender systems to achieve more personalized, transparent, and semantically aligned reco…
The paper introduces COMET, a novel PLS-SVD framework, to analyze the audio-text modality gap in CLAP models, showing that shared concepts are captured by a small subset of axes, and proposes a spectr…