Learn how to generate the quantized MIDI version of an audio file with Python in Ubuntu 18.04

How to extract the melody from an audio file and export it to MIDI (generate quantized MIDI) using Python in Ubuntu 18.04

Melody extraction is the task of automatically estimating the fundamental frequency corresponding to the pitch of the predominant melodic line of a piece of polyphonic (or homophonic) music. Other names for this task include Audio Melody Extraction, Predominant Melody Extraction, Predominant Melody Estimation and Predominant Fundamental Frequency (F0) Estimation.

In this article, we will explain you how easy it is to generate a MIDI version from a WAV (raw audio format) file in Ubuntu 18.04.

1. Install Melodia melody extraction Vamp plugin

The MELODIA plug-in automatically estimates the pitch of a song's main melody. More specifically, it implements an algorithm that automatically estimates the fundamental frequency corresponding to the pitch of the predominant melodic line of a piece of polyphonic (or homophonic or monophonic) music. Given a song, the algorithm estimates:

  1. When the melody is present and when it is not (a.k.a. voicing detection)
  2. The pitch of the melody when it is present

A non-scientist friendly introduction to Melody Extraction as well as the algorithm, including graphs and sound examples, can be found on: www.justinsalamon.com/melody-extraction. As first step, you need to download Melodia and make it available from the terminal. In this page you will find the download form of Melodia. You can fill it with fake information of course to move on:

Download Melodia Form

Once you submit the form, the download will start and you will get the tar file. Then, extract all the content of the zip:

Melodia Zip Data

In the /usr/local/lib/vamp directory (create the vamp directory and store the content of the zip inside). Note that you will need root access to extract the content of the zip in the mentioned directory.

2. Clone audio_to_midi_melodia project

After installing Melodia, you need now to obtain the source code of the Audio to Midi project available at Github. You can quickly clone the code in some directory in your computer running the following command:

git clone https://github.com/justinsalamon/audio_to_midi_melodia.git

The audio_to_midi_melodia python script allows you to extract the melody of a song and save it to a MIDI file. The script uses the Melodia algorithm to perform melody extraction, taking advantage of the new vamp module that allows running vamp plugins (like Melodia) directly in pythonOnce the pitch contour of the melody is extracted, the next (non-trivial!) step is to segment it into notes and quantize the pitch of each note, producing a discrete series of notes that can then be exported into a any symbolic format such as MIDI or JAMS.

Quantizing a continuous pitch sequence into a series of notes is an active area of research and remains and open problem. Still, we can obtain fairly decent results using a series of heuristics:

  1. Convert the pitch sequence from Hertz to (fractional) MIDI note numbers
  2. Round each value to the nearest integer MIDI note number
  3. Optionally apply a median filter to smooth out short jumps in pitch (e.g. due to vibrato)
  4. Iterate over the sequence and whenever the pitch changes start a new note

After cloning the project, switch to the audio_to_midi_melodia directory:

cd audio_to_midi_melodia

Then you will be able to install the dependencies to run the project. For more information about this project, please visit the official repository at Github here.

3. Install dependencies

The project needs multiple libraries in order to work properly. You can install them easily with a single command reading the requirements.txt file in the terminal:

pip install -r requirements.txt

The packages that will be installed are:

  • soundfile: SoundFile is an audio library based on libsndfile, CFFI and NumPy.
  • resampy: Efficient sample rate conversion in Python. This package implements the band-limited sinc interpolation method for sampling rate conversion.
  • vampThis module allows Python code to load and use native-code Vamp plugins (http://vamp-plugins.org) for audio feature analysis.
  • midiutilA Python interface for writing multi-track MIDI Files.
  • jams: A JSON Annotated Music Specification for Reproducible MIR Research.
  • numpy: NumPy is the fundamental package for scientific computing with Python. It contains among other things.
  • scipy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

4. Generate MIDI file from WAV

After installing the dependencies, you may now proceed with the generation of the MIDI version of your Audio, providing simply as positional argument the path to the WAV file that you want to convert into MIDI and provide as second positional argument the output MIDI file:

python audio_to_midi_melodia.py ./input_audio.wav ./output_file.mid 120

Note that exporting to MIDI requires providing a BPM (beats-per-minute) value, you may define a value arbitrarily, estimate it manually, or estimate it automatically using one of the tempo estimation algorithms included in Essentia, Librosa, or if you'd like to stick to vamp plugins QMVP. No BPM is required if you export to JAMS, which directly uses the note onset times estimated from the audio track. However, you may search on Google as well the tempo of the song and provide this value. In this example, we'll set the the BPM to 120 (third positional argument).

You may change as well the minimum duration of every synth and export it as well in JAMS format:

python audio_to_midi_melodia.py infile outfile bpm [--smooth SMOOTH] [--minduration MINDURATION] [--jams]

More information

Happy coding !

Senior Software Engineer at Software Medico. Interested in programming since he was 14 years old, Carlos is a self-taught programmer and founder and author of most of the articles at Our Code World.