Code & data

Code is made available through github, either at my repository or at the research groups I have been with, MTG and SMC-INESC.

Except parts of code which I couldn’t publish because of non-disclosure agreements, the research presented in papers is documented in the repositories referenced in the papers, along with links to the datasets and instructions on how to replicate those experiments.

Mirdata and soundata


Programatic dataset loaders in Python

During 2020-2021 I have supervised a team of 2 MTG interns who worked part-time on implementing MTG dataloaders into mirdata and soundata libraries which allow for quickly prototyping software and machine learning models using these datasets. We have closely colaborated with the main authors of these libraries, Magdalena Fuentes (NYU) and Rachel Bittner (Spotify).



Deep Convolutional Neural Networks for Musical Source Separation

This repository contains classes for data generation and preprocessing, useful in training neural networks with large datasets that do not fit into memory. Additionally, you can find classes to query samples of instrument sounds from RWC instrument sound dataset. In the ‘examples’ folder you can find use cases for the classes above for the case of music source separation.



an openFrameworks tapping recorder

beatStation was designed as a game with a purpose application in which users can compete between each other in tapping various songs. It can be used by researchers to annotate audio, conduct experiments, or as an inspiration for future apps.

Live drum transcription

This is audio drum transcription algorithm in pure data, MaxMSP and Max4Live which can transcribe kick, snare, and hi-hat from live drum performances. The software takes live audio or files as input and triggers events for each drum type as output.

PHENICX anechoic dataset


This dataset includes audio and annotations useful for tasks as score-informed source separation, score following, multi-pitch estimation, transcription or instrument detection, in the context of symphonic music.The dataset was presented and used in the evaluation of:

M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, “Score-informed source separation for multi-channel orchestral recordings”, Journal of Electrical and Computer Engineering (2016))”