Unsupervised animal vocalization denoising

Biodenoising

how to denoise animal vocalizations without clean data?

Introduction

biodenoising

We present Biodenoising, a new method for animal vocalization denoising that does not require access to clean data. There are two core ideas behind Biodenoising:

We leverage existing speech enhancement models: there is no need to start from scratch. Speech enhancement models have seen a lot of data and have some knowledge about patterns in audio time series.
Following the same logic, there is no need to train a separate model for each animal dataset. Since lots of signal characteristics are replicated at different scales and frequencies, a model that has seen more diverse data can be more robust to unseen conditions and generalize better.

There is a eloquent video about how these audio patterns work for whales and birds.

We publish the pre-print on arXiv.

Marius Miron, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, “Biodenoising: animal vocalization denoising without access to clean data”

Along with the pre-print, we publish two Python pip-installable libraries ‘biodenoising’, ‘biodenoising-inference’, and ‘biodenoising-datasets’ that can be used to denoise animal vocalizations and download the datasets.

We base our work on the speech enhancement models demucs dns 48 and CleanUNet because they were small models and fast to train. Demucs worked particularly well. The performance may improve by training newer architectures.

Abstract

Animal vocalization denoising is a task similar to human speech enhancement, a well-studied field of research. In contrast to the latter, it is applied to a higher diversity of sound production mechanisms and recording environments, and this higher diversity is a challenge for existing models. Adding to the challenge and in contrast to speech, we lack large and diverse datasets comprising clean vocalizations. As a solution we use as training data pseudo-clean targets, i.e. pre-denoised vocalizations, and segments of background noise without a vocalization. We propose a train set derived from bioacoustics datasets and repositories representing diverse species, acoustic environments, geographic regions. Additionally, we introduce a non-overlapping benchmark set comprising clean vocalizations from different taxa and noise samples. We show that that denoising models (demucs, CleanUNet) trained on pseudo-clean targets obtained with speech enhancement models achieve competitive results on the benchmarking set. We publish data, code, libraries, and demos https://mariusmiron.com/research/biodenoising.

Benchmarking dataset

We introduce a benchmarking dataset for animal vocalization denoising, Biodenoising_validation. It contains 62 pairs of clean animal vocalizations and noise excerpts. We list some audio demos from this dataset below. Details about the training data can be found at the end of this page.

Audio demos

Here we look at zero-shot performance of the methods on the benchmarking dataset, i.e. generalization to unseen taxa and noise. None of the methods has been adapted/seen to the tested datasets. So the performance may improve when doing self-training to those data. We are actually working on such a method.

First, we compare the original noisy file with our denoising trained on pseudo-clean targets(biodenoising) and two state of the art methods noisereduce and noisy target training.

Original	Biodenoising	Noisereduce	Noisy target

How well does it do on longer recordings?

Original	Biodenoising	Noisereduce

Recording animals in the lab does not always yield clean vocalizations. In fact these zebra finch recorded with a close-mic are noisy because you can hear the fan and the wings and hopping. And noisereduce while it works great for the fan noise it can not do a good job for the wings and hopping.

Original	Biodenoising	Noisereduce

The most difficult condition is when we try to denoise biologger recordings, like this carrion crow. Again the wind and the self-noise are very loud.

Original	Biodenoising	Noisereduce

Underwater conditions tend to be noisier than terrestrial conditions. These models were not trained to operate below -5dB SNR but they can still perform reasonably well. Here you can find recordings of orcas from Orcasound and South-Alaska humpback whale recorded by Michelle Fournet.

Original	Biodenoising	Noisereduce

My favorite recording is the one of a bowhead whale from the Watkins Marine Mammals dataset. Note that in contrast to the examples above this noisy recording was pre-cleaned using speech enhancement models and then used in training. This recording motivated me to start this project.

Original	Biodenoising	Noisereduce

Training dataset description

Noisy datasets	Hours	Medium	Private	Direct	Link	Type
Dolphin signature whistles	0.23	underwater	yes	no	link	dolphins
Hanaian Gibbons	1.11	terrestrial	no	yes	link	gibbons
Geladas	2.23	terrestrial	yes	no	link	geladas
Orcasound Aldev	0.25	underwater	no	yes	link	orcas
Thyolo	0.61	terrestrial	no	yes	link	birds
Anuran	1.13	terrestrial	no	no	link	frogs
South-Alaska humpback whale	14.13	underwater	yes	no	link	cetaceans
Orcasound SKRW	2.41	underwater	no	yes	link	orcas
Black and white ruffed lemur	1.06	terrestrial	no	yes	link	lemurs
Orcasound humpback whale	0.8	underwater	no	yes	link	orcas
Orchive	0.03	underwater	no	yes	link	orcas
Whydah	0.57	terrestrial	no	yes	link	birds
Sabiod NIPS4B	0.55	underwater	no	yes	link	cetaceans
Xeno canto labeled subset	6.82	terrestrial	no	yes	link	birds
ASA Berlin	4.69	terrestrial	no	no	link	various
Watkins	5.33	underwater	no	yes	link	various
Macaques coo calls	0.7	terrestrial	no	yes	link	macaques

Noise datasets	Hours	Medium	Private	Direct	Link	Type
FSD50k subset	26.34	terrestrial	no	yes	link	various
IDMT Traffic	9.72	terrestrial	no	yes	link	streets
ShipsEar	3.55	underwater	yes	no	link	ships
DeepShip subset	1.78	underwater	no	yes	link	ships
Orcasound ship noise	7.23	underwater	no	yes	link	ships
TUT 2016 subset	0.33	terrestrial	no	yes	link	home

Extracted noise	Hours	Medium	Private	Direct	Link	Type
MARS MBARI	0.5	underwater	no	yes	link	various
NOAA Sanctsound	47.48	underwater	no	yes	link	various
Orcasound best os	1.6	underwater	no	yes	link	various
South-Alaska humpback whale	114.85	underwater	yes	no	link	various

Bibtex

@misc{miron2024biodenoisinganimalvocalizationdenoising, title={Biodenoising: animal vocalization denoising without access to clean data}, author={Marius Miron and Sara Keen and Jen-Yu Liu and Benjamin Hoffman and Masato Hagiwara and Olivier Pietquin and Felix Effenberger and Maddie Cusimano}, year={2024}, eprint={2410.03427}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2410.03427}, } “