~ similar to 2606.02341· 9 results
Echo is a joint-embedding predictive architecture that uses a single, pretrained ViT encoder to simultaneously perform speaker diarization, speech recognition, and dynamic source separation in a share…
Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra +1 more
The paper introduces Entropic Projection Alignment (EPA), a unified framework that estimates, explains, and improves model performance under distribution shift by aligning source and target distributi…
Pengcheng Zhou, Pianran Guo, Shuhua Chen, Mengqin Zhao +2 more
The paper proposes Domain-Aware Sharpness Minimization (DASM), a novel optimizer that enhances the robustness and generalization of voice stream steganalysis models across varying data distributions.
The paper introduces an adaptive reservoir computing framework that tailors Echo State Networks (ESNs) to specific evaluation scenarios, achieving a high score on the CTF-4-Science Lorenz benchmark fo…
The paper introduces Morlet Positional Encoding (MoPE), a novel wavelet-based positional encoding that models position and locality simultaneously, outperforming standard sinusoidal and RoPE methods.
The paper proposes the Morlet Spectral Transformer (MST), a novel architecture that effectively decodes cross-subject emotion from EEG by designing specialized spectral and spatial representations, ou…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more
The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…