cs.SDEmpirical

Instantaneous Pitch Estimation via Wave-U-Net-Based Fundamental Waveform Enhancement

Jun 12, 2026

AI Summaryllama-3.1-8b-instruct

A Wave-U-Net model is trained to extract a fundamental waveform from input speech signals for accurate and robust instantaneous pitch estimation.

Formulating fundamental waveform filtering as a speech enhancement problem and using a deep learning model for instantaneous pitch estimation

Keywords

instantaneous pitch estimation speech enhancement Wave-U-Net fundamental waveform

Before reading this…

Deep learning Speech enhancement Fundamental frequency estimation

Applications

→Speech prosody analysis
→Singing technique analysis
→Musical instrument analysis
→Degraded speech signal analysis

Skill Ladder

To understand this paper, make sure you know these concepts first:

Deep learningfind papers →
Speech enhancementfind papers →
Fundamental frequency estimationfind papers →

Abstract

More Like This

Instantaneous pitch estimation plays an important role in analyzing steep pitch variations such as speech prosody and singing techniques. Conventional approaches estimate instantaneous frequency after isolating the fundamental waveform from signals that contain harmonics and noise, which makes the accuracy sensitive to imperfect fundamental filtering. In this study, we formulate fundamental waveform filtering as a speech enhancement problem. Specifically, we train a Wave-U-Net model to extract a fundamental waveform from an input speech signal. The instantaneous pitch is then obtained by computing the instantaneous frequency from the analytic signal of the estimated fundamental waveform. Experimental results show that the proposed method outperforms conventional deterministic approaches and provides accurate and robust instantaneous pitch estimation across diverse domains, including speech, singing voice, musical instruments, and degraded speech signals.