ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

8 results for “instantaneous pitch estimation”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.SDEmpiricalRecentJun 12, 2026

Instantaneous Pitch Estimation via Wave-U-Net-Based Fundamental Waveform Enhancement

Junya Koguchi, Tomoki Koriyama

A Wave-U-Net model is trained to extract a fundamental waveform from input speech signals for accurate and robust instantaneous pitch estimation.

View →
eess.ASEmpiricalRecentJun 12, 2026

Unsupervised Approaches for Global Prosodic Embedding Extraction

Martin Meza, Luciana Ferrer, Pablo Riera

The paper proposes methods for generating global prosodic embeddings using auto-encoder models of pitch and energy, demonstrating competitive or superior performance under challenging conditions.

View →
eess.AScs.CLRecentMay 28, 2026

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

Pedro H. L. Leite, Pedro Benevenuto Valadares, Luiz W. P. Biscainho

The paper proposes a novel workflow to extract fine-grained regional accent features in Brazilian Portuguese using only acoustic labels and a phoneme-based forced aligner, showing that localized featu…

View →
cs.SDcs.AIeess.ASRecentJun 1, 2026

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Louis Mouchon

Echo is a joint-embedding predictive architecture that uses a single, pretrained ViT encoder to simultaneously perform speaker diarization, speech recognition, and dynamic source separation in a share…

View →
cs.CLcs.AIeess.ASRecentMay 31, 2026

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more

PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…

View →
cs.CLcs.AIcs.SDRecentMay 28, 2026

MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

Daeyong Kwon, Qiyu Wu, Shinobu Kuriya, Junghyun Koo +5 more

The paper introduces MusTBENCH, a new benchmark, and MusT, an optimization recipe, to rigorously test and improve the ability of Large Audio-Language Models (LALMs) to accurately ground their musical…

View →
cs.LGcs.CLeess.SPRecentMay 31, 2026

Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding

Athanasios Zeris

The paper introduces Morlet Positional Encoding (MoPE), a novel wavelet-based positional encoding that models position and locality simultaneously, outperforming standard sinusoidal and RoPE methods.

View →
cs.LGcs.AIeess.ASRecentMay 31, 2026

MURMUR: An Efficient Inference System for Long-Form ASR

Wei-Tzu Lee, Keisuke Kamahori, Baris Kasikci

Murmur is an efficient inference system for long-form ASR that resolves the accuracy-latency trade-off by optimizing both inter-chunk processing and intra-chunk attention mechanisms.

View →