Speech Denoising – Audio Sample Results

DenoiseNet is a U-Net based deep learning model designed for speech denoising tasks. It was trained on 2.5 hours of english speech data from the LibriSpeech dataset, corrupted with various babble noise samples at 0 dB SNR. The model provides state-of-the-art denoising performance while maintaining low computational complexity, making it suitable for real-time applications (the model runs in ~0.1 seconds on 3-10 second audio clips on a standard machine).
This page presents qualitative audio results of our DenoiseNet model for three speech samples corrupted by babble noise at 0 dB SNR. For each sample, the same noise realization is used across all degraded and enhanced versions. A clean reference signal is provided for perceptual comparison. The following variants are available:
(i) noisy input, (ii) early denoising model, (iii) final denoising model, (iv) clean reference.
It is important to mention that these samples are not part of the training dataset. They were selected to demonstrate the model's generalization capabilities. They are part of another dataset entirely.

Sample 1 — Speaker LJ001-0003

Noisy - 0 dB SNR
Early denoising - 3.91 dB SNR
Final denoising - 8.01 dB SNR
Clean reference

Sample 2 — Speaker Russian Man 1

Noisy - 0 dB SNR
Early denoising - 3.85 dB SNR
Final denoising - 6.56 dB SNR
Clean reference

Sample 3 — Speaker LJ003-0011

Noisy - 0 dB SNR
Early denoising - 7.25 dB SNR
Final denoising - 7.66 dB SNR
Clean reference