We present Biodenoising, a new method for animal vocalization denoising that does not require access to clean data. There are two core ideas behind Biodenoising:
There is a eloquent video about how these audio patterns work for whales and birds.
We publish the pre-print on arXiv.
Marius Miron, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, “Biodenoising: animal vocalization denoising without access to clean data”
Along with the pre-print, we publish two Python pip-installable libraries ‘biodenoising’, ‘biodenoising-inference’, and ‘biodenoising-datasets’ that can be used to denoise animal vocalizations and download the datasets.
Github | Github inference | Github Datasets | Colab |
We base our work on the speech enhancement models demucs dns 48 and CleanUNet because they were small models and fast to train. Demucs worked particularly well. The performance may improve by training newer architectures.
Animal vocalization denoising is a task similar to human speech enhancement, a well-studied field of research. In contrast to the latter, it is applied to a higher diversity of sound production mechanisms and recording environments, and this higher diversity is a challenge for existing models. Adding to the challenge and in contrast to speech, we lack large and diverse datasets comprising clean vocalizations. As a solution we use as training data pseudo-clean targets, i.e. pre-denoised vocalizations, and segments of background noise without a vocalization. We propose a train set derived from bioacoustics datasets and repositories representing diverse species, acoustic environments, geographic regions. Additionally, we introduce a non-overlapping benchmark set comprising clean vocalizations from different taxa and noise samples. We show that that denoising models (demucs, CleanUNet) trained on pseudo-clean targets obtained with speech enhancement models achieve competitive results on the benchmarking set. We publish data, code, libraries, and demos https://mariusmiron.com/research/biodenoising.
We introduce a benchmarking dataset for animal vocalization denoising, Biodenoising_validation. It contains 62 pairs of clean animal vocalizations and noise excerpts. We list some audio demos from this dataset below. Details about the training data can be found at the end of this page.
Here we look at zero-shot performance of the methods on the benchmarking dataset, i.e. generalization to unseen taxa and noise. None of the methods has been adapted/seen to the tested datasets. So the performance may improve when doing self-training to those data. We are actually working on such a method.
First, we compare the original noisy file with our denoising trained on pseudo-clean targets(biodenoising) and two state of the art methods noisereduce and noisy target training.
Original | Biodenoising | Noisereduce | Noisy target |
---|---|---|---|
How well does it do on longer recordings?
Original | Biodenoising | Noisereduce |
---|---|---|
Recording animals in the lab does not always yield clean vocalizations. In fact these zebra finch recorded with a close-mic are noisy because you can hear the fan and the wings and hopping. And noisereduce while it works great for the fan noise it can not do a good job for the wings and hopping.
Original | Biodenoising | Noisereduce |
---|---|---|
The most difficult condition is when we try to denoise biologger recordings, like this carrion crow. Again the wind and the self-noise are very loud.
Original | Biodenoising | Noisereduce |
---|---|---|
Underwater conditions tend to be noisier than terrestrial conditions. These models were not trained to operate below -5dB SNR but they can still perform reasonably well. Here you can find recordings of orcas from Orcasound and South-Alaska humpback whale recorded by Michelle Fournet.
Original | Biodenoising | Noisereduce |
---|---|---|
My favorite recording is the one of a bowhead whale from the Watkins Marine Mammals dataset. Note that in contrast to the examples above this noisy recording was pre-cleaned using speech enhancement models and then used in training. This recording motivated me to start this project.
Original | Biodenoising | Noisereduce |
---|---|---|
Noisy datasets | Hours | Medium | Private | Direct | Link | Type |
---|---|---|---|---|---|---|
Dolphin signature whistles | 0.23 | underwater | yes | no | link | dolphins |
Hanaian Gibbons | 1.11 | terrestrial | no | yes | link | gibbons |
Geladas | 2.23 | terrestrial | yes | no | link | geladas |
Orcasound Aldev | 0.25 | underwater | no | yes | link | orcas |
Thyolo | 0.61 | terrestrial | no | yes | link | birds |
Anuran | 1.13 | terrestrial | no | no | link | frogs |
South-Alaska humpback whale | 14.13 | underwater | yes | no | link | cetaceans |
Orcasound SKRW | 2.41 | underwater | no | yes | link | orcas |
Black and white ruffed lemur | 1.06 | terrestrial | no | yes | link | lemurs |
Orcasound humpback whale | 0.8 | underwater | no | yes | link | orcas |
Orchive | 0.03 | underwater | no | yes | link | orcas |
Whydah | 0.57 | terrestrial | no | yes | link | birds |
Sabiod NIPS4B | 0.55 | underwater | no | yes | link | cetaceans |
Xeno canto labeled subset | 6.82 | terrestrial | no | yes | link | birds |
ASA Berlin | 4.69 | terrestrial | no | no | link | various |
Watkins | 5.33 | underwater | no | yes | link | various |
Macaques coo calls | 0.7 | terrestrial | no | yes | link | macaques |
Noise datasets | Hours | Medium | Private | Direct | Link | Type |
---|---|---|---|---|---|---|
FSD50k subset | 26.34 | terrestrial | no | yes | link | various |
IDMT Traffic | 9.72 | terrestrial | no | yes | link | streets |
ShipsEar | 3.55 | underwater | yes | no | link | ships |
DeepShip subset | 1.78 | underwater | no | yes | link | ships |
Orcasound ship noise | 7.23 | underwater | no | yes | link | ships |
TUT 2016 subset | 0.33 | terrestrial | no | yes | link | home |
Extracted noise | Hours | Medium | Private | Direct | Link | Type |
---|---|---|---|---|---|---|
MARS MBARI | 0.5 | underwater | no | yes | link | various |
NOAA Sanctsound | 47.48 | underwater | no | yes | link | various |
Orcasound best os | 1.6 | underwater | no | yes | link | various |
South-Alaska humpback whale | 114.85 | underwater | yes | no | link | various |
@misc{miron2024biodenoisinganimalvocalizationdenoising, title={Biodenoising: animal vocalization denoising without access to clean data}, author={Marius Miron and Sara Keen and Jen-Yu Liu and Benjamin Hoffman and Masato Hagiwara and Olivier Pietquin and Felix Effenberger and Maddie Cusimano}, year={2024}, eprint={2410.03427}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2410.03427}, } “