layout: page subheadline: “” sidebar: right title: “Interspeech 2024” teaser: “” header: image_fullwidth: “whale.png” categories:

This was my first Interspeech in person, during the post-pandemic one in Seul I decided not to travel and chaired a session remotely. This is a large conference with multiple parallel sessions, lots of attendees.

This is a list of interesting papers that I have seen at Interspeech 2024 in Kos, Greece. Note that this is not a comprehensive list nor is a ranking of the best paper in the programme. The schedule was quite packed so I had to skip many sessions that overlapped to prioritize topics that I currently work on, i.e. bioacoustics.

To sum up there is a lot of interest in self-supervised models with another benchmark to evaluate these models on, SUPERB 2.0. There are countless papers using these models or even powerful supervised models (Whisper) on various downstream tasks.

There is also an increasing interest in using neural codecs/tokens rather than spectrograms, waveforms, embeddings from other models. This is reflected in a new challenge The Interspeech 2024 Challenge on Speech Processing Using Discrete Units and several very good papers on this topic.

Although bioacoustics was a special theme this year I have not seen many papers on this topic. This was surprising but I guess it was good for the VIHAR workshop ESP organized as a satellite event of Interspeech.

Detection and classification of bioacoustics signals

Generative speech enhancement

Speech benchmarks:

Neural codecs:

Foundation models