Sound and Music Computing conference

Paper on generating training data for deep learning source separation methods...

During July 5th and 9th I have attended the Sound and Music Computing Conference in Helsinki, Finland. I presented a paper on generating training data for deep learning source separation method, particularly in classical music, where you have the score but no multi-track data. The slides can be found online and the code is on the source separation github repository DeepConvSep.

I had the opportunity to visit the acoustics lab at Aalto University and attend a few demos. I’ve been in the anechoic rooms where they recorded the orchestra dataset which I have annotated and used in my paper on score-informed orchestral separation. Interestingly, in one of the demos, Jukka Patynen convolved close-microphone recordings with impulse responses taken from famous concert venues, to demonstrate how different the same recording can sound in varius halls.

There were quite a few interesting posters, from which I mention:

Interesting app which doesn’t see the learning process as a game but rather as a collaborative process, Burns et al. : Learning to Play the Guitar at the Age of Interactive and Collaborative Web
Source separation for voice removal and then DTW for the automatic accompaniment, Wada et al. : An Adaptive Karaoke System that Plays Accompaniment Parts of Music Audio Signals Synchronously with Users’ Singing Voices
Kirkbride: Troop: A Collaborative Tool for Live Coding

From the presented papers, I was mostly interested in the MIR ones using deep learning, but there were also some other interesting ones:

Emotion recognition using many features and MFCCs with two CNN branches for each emotion, Malik et al.: Stacked Convolutional and Recurrent Neural Networks for Music Emotion
In contrast to other architectures which use large filters, they use small filters, Lee et al.: Sample-Level Deep Convolutional Neural Networks for Music Auto-Tagging Using Raw Waveforms
Basically in-ear filtering and mixing for hearing protection, Albrecht et al.: Electronic Hearing Protection for Musicians
Meter detection in symbolic files, Mcleod et al. : Meter Detection in Symbolic Music Using a Lexicalized PCFG
Transcription with RNNs and alignment with DTW for piano recordings, Kwon et al.: Audio-to-Score Alignment of Piano Music Using RNN-Based Automatic Music
Orchestration using a RBM with a binarized version of the piano roll as input, Crestel et al.: Live Orchestral Piano, a System for Real-Time Orchestral Music Generation
Active listening; they change onsets and pitches of the voice, Ojima et al.: A Singing Instrument for Real-Time Vocal-Part Arrangement of Music and Audio Signals

The proceedings can be found online.

I really liked the keynote from Anssi Klapuri. A few years ago he moved from academia to Yousician, a company that develops apps for music learning. He presented the app and the architecture of their audio engine (they use Unity for graphics but for audio they use their own audio engine). Interestingly, most of the audio part is minimally processed and the audio is rooted as soon as possible to the output. The cross-cancelling seemed quite robust. Also, I really liked the testing techniques: they record the impulse responses of different phones and instruments so they can use this data afterwards for testing.

2017-07-09 BLOG
coding, deep learning, source separation