ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

13 results for “Fundamentals of speaker diarization, speaker verification, and speaker identification”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

eess.ASEmpiricalRecentJun 12, 2026

Who Spoke When in Multi-Conversation: Target Speaker Tagging Task and Benchmark

Minjae Lee, Hee-Soo Heo, Youngki Kwon, Han-Gyu Kim +2 more

The paper introduces Target Speaker Tagging (TST), a task that combines speaker diarization, verification, and identification into a single workflow for multi-speaker conversations. It presents TST-Be…

View →
cs.SDcs.AIeess.ASRecentJun 1, 2026

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Louis Mouchon

Echo is a joint-embedding predictive architecture that uses a single, pretrained ViT encoder to simultaneously perform speaker diarization, speech recognition, and dynamic source separation in a share…

View →
cs.SDEmpiricalRecentJun 12, 2026

Instantaneous Pitch Estimation via Wave-U-Net-Based Fundamental Waveform Enhancement

Junya Koguchi, Tomoki Koriyama

A Wave-U-Net model is trained to extract a fundamental waveform from input speech signals for accurate and robust instantaneous pitch estimation.

View →
eess.AScs.CLRecentMay 28, 2026

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

Pedro H. L. Leite, Pedro Benevenuto Valadares, Luiz W. P. Biscainho

The paper proposes a novel workflow to extract fine-grained regional accent features in Brazilian Portuguese using only acoustic labels and a phoneme-based forced aligner, showing that localized featu…

View →
cs.CRcs.SDRecentMay 19, 2026

DASM: Domain-Aware Sharpness Minimization for Multi-Domain Voice Stream Steganalysis

Pengcheng Zhou, Pianran Guo, Shuhua Chen, Mengqin Zhao +2 more

The paper proposes Domain-Aware Sharpness Minimization (DASM), a novel optimizer that enhances the robustness and generalization of voice stream steganalysis models across varying data distributions.

View →
cs.CLcs.AIcs.SDRecentMay 29, 2026

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

Máté Gedeon, Piroska Zsófia Barta, Péter Mihajlik, Katalin Mády

The paper introduces BEA-Dialogue+, an expanded 200-hour corpus for Hungarian conversational ASR, demonstrating that while larger data is challenging, specialized fine-tuning techniques significantly…

View →
cs.SDcs.AIcs.CRRecentJun 4, 2026

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Yifan Liao, Zongmin Zhang, Zhen Sun, Yuhui Sun +2 more

The paper introduces a novel Clean-Referenced Feature-Vocoder Attack, a black-box adversarial attack that perturbs high-level SSL feature representations instead of raw audio waveforms, achieving supe…

View →
cs.SDcs.CRRecentMay 2, 2026

MelShield: Robust Mel-Domain Audio Watermarking for Provenance Attribution of AI Generated Synthesized Speech

Yutong Jin, Qi Li, Lingshuang Liu, Jianbing Ni

MelShield is a robust, in-generation audio watermarking framework that embeds identifiable signals into AI-generated speech in the Mel-spectrogram domain for reliable copyright protection and attribut…

View →
cs.CRcs.CLcs.LGRecentMay 28, 2026

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu +4 more

This paper introduces 'implicit identity' as a unifying framework to survey and categorize LLM fingerprinting and watermarking techniques for verifying ownership and provenance across datasets, models…

View →
eess.AScs.CRcs.LGRecentMay 4, 2026

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur

The paper introduces GRIDS, a framework using Local Intrinsic Dimensionality (LID) to detect anomalies in self-supervised speech model representations, showing that LID elevation correlates with ASR d…

View →
cs.CLcs.AIRecentMay 27, 2026

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

Haechan Kim, Seungjun Chung, Inkyu Park, Jihoo Lee +1 more

The paper introduces three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) to evaluate SpeechLMs, demonstrating that English-centric evaluation fails to capture performance gaps…

View →
cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →