Unsupervised animal vocalization denoising

Biodenoising

how to denoise animal vocalizations without clean data?

Introduction

biodenoising

We present Biodenoising, a new method for animal vocalization denoising that does not require access to clean data. There are two core ideas behind Biodenoising:

  • We leverage existing speech enhancement models: there is no need to start from scratch. Speech enhancement models have seen a lot of data and have some knowledge about patterns in audio time series.
  • Following the same logic, there is no need to train a separate model for each animal dataset. Since lots of signal characteristics are replicated at different scales and frequencies, a model that has seen more diverse data can be more robust to unseen conditions and generalize better.

There is a eloquent video about how these audio patterns work for whales and birds.

We publish the pre-print on arXiv.

Marius Miron, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, “Biodenoising: animal vocalization denoising without access to clean data”

Along with the pre-print, we publish two Python pip-installable libraries ‘biodenoising’, ‘biodenoising-inference’, and ‘biodenoising-datasets’ that can be used to denoise animal vocalizations and download the datasets.

GithubGithub inferenceGithub DatasetsColab

We base our work on the speech enhancement models demucs dns 48 and CleanUNet because they were small models and fast to train. Demucs worked particularly well. The performance may improve by training newer architectures.

Abstract

Animal vocalization denoising is a task similar to human speech enhancement, a well-studied field of research. In contrast to the latter, it is applied to a higher diversity of sound production mechanisms and recording environments, and this higher diversity is a challenge for existing models. Adding to the challenge and in contrast to speech, we lack large and diverse datasets comprising clean vocalizations. As a solution we use as training data pseudo-clean targets, i.e. pre-denoised vocalizations, and segments of background noise without a vocalization. We propose a train set derived from bioacoustics datasets and repositories representing diverse species, acoustic environments, geographic regions. Additionally, we introduce a non-overlapping benchmark set comprising clean vocalizations from different taxa and noise samples. We show that that denoising models (demucs, CleanUNet) trained on pseudo-clean targets obtained with speech enhancement models achieve competitive results on the benchmarking set. We publish data, code, libraries, and demos https://mariusmiron.com/research/biodenoising.

Benchmarking dataset

We introduce a benchmarking dataset for animal vocalization denoising, Biodenoising_validation. It contains 62 pairs of clean animal vocalizations and noise excerpts. We list some audio demos from this dataset below. Details about the training data can be found at the end of this page.

Audio demos

Here we look at zero-shot performance of the methods on the benchmarking dataset, i.e. generalization to unseen taxa and noise. None of the methods has been adapted/seen to the tested datasets. So the performance may improve when doing self-training to those data. We are actually working on such a method.

First, we compare the original noisy file with our denoising trained on pseudo-clean targets(biodenoising) and two state of the art methods noisereduce and noisy target training.

OriginalBiodenoisingNoisereduceNoisy target

How well does it do on longer recordings?

OriginalBiodenoisingNoisereduce

Recording animals in the lab does not always yield clean vocalizations. In fact these zebra finch recorded with a close-mic are noisy because you can hear the fan and the wings and hopping. And noisereduce while it works great for the fan noise it can not do a good job for the wings and hopping.

OriginalBiodenoisingNoisereduce

The most difficult condition is when we try to denoise biologger recordings, like this carrion crow. Again the wind and the self-noise are very loud.

OriginalBiodenoisingNoisereduce

Underwater conditions tend to be noisier than terrestrial conditions. These models were not trained to operate below -5dB SNR but they can still perform reasonably well. Here you can find recordings of orcas from Orcasound and South-Alaska humpback whale recorded by Michelle Fournet.

OriginalBiodenoisingNoisereduce

My favorite recording is the one of a bowhead whale from the Watkins Marine Mammals dataset. Note that in contrast to the examples above this noisy recording was pre-cleaned using speech enhancement models and then used in training. This recording motivated me to start this project.

OriginalBiodenoisingNoisereduce

Training dataset description

Noisy datasetsHoursMediumPrivateDirectLinkType
Dolphin signature whistles0.23underwateryesnolinkdolphins
Hanaian Gibbons1.11terrestrialnoyeslinkgibbons
Geladas2.23terrestrialyesnolinkgeladas
Orcasound Aldev0.25underwaternoyeslinkorcas
Thyolo0.61terrestrialnoyeslinkbirds
Anuran1.13terrestrialnonolinkfrogs
South-Alaska humpback whale14.13underwateryesnolinkcetaceans
Orcasound SKRW2.41underwaternoyeslinkorcas
Black and white ruffed lemur1.06terrestrialnoyeslinklemurs
Orcasound humpback whale0.8underwaternoyeslinkorcas
Orchive0.03underwaternoyeslinkorcas
Whydah0.57terrestrialnoyeslinkbirds
Sabiod NIPS4B0.55underwaternoyeslinkcetaceans
Xeno canto labeled subset6.82terrestrialnoyeslinkbirds
ASA Berlin4.69terrestrialnonolinkvarious
Watkins5.33underwaternoyeslinkvarious
Macaques coo calls0.7terrestrialnoyeslinkmacaques
Noise datasetsHoursMediumPrivateDirectLinkType
FSD50k subset26.34terrestrialnoyeslinkvarious
IDMT Traffic9.72terrestrialnoyeslinkstreets
ShipsEar3.55underwateryesnolinkships
DeepShip subset1.78underwaternoyeslinkships
Orcasound ship noise7.23underwaternoyeslinkships
TUT 2016 subset0.33terrestrialnoyeslinkhome
Extracted noiseHoursMediumPrivateDirectLinkType
MARS MBARI0.5underwaternoyeslinkvarious
NOAA Sanctsound47.48underwaternoyeslinkvarious
Orcasound best os1.6underwaternoyeslinkvarious
South-Alaska humpback whale114.85underwateryesnolinkvarious

Bibtex

@misc{miron2024biodenoisinganimalvocalizationdenoising, title={Biodenoising: animal vocalization denoising without access to clean data}, author={Marius Miron and Sara Keen and Jen-Yu Liu and Benjamin Hoffman and Masato Hagiwara and Olivier Pietquin and Felix Effenberger and Maddie Cusimano}, year={2024}, eprint={2410.03427}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2410.03427}, } “