In Audio recognition, when you would like to identify some song, an application like Shazam quickly turns your clip into an audio fingerprint that generates a special algorithm. In their database, they have the fingerprint of more than 8 million songs generated with the same algorithm, so now it's just a matter of numerical data search and pattern matching on the background. Shazam searches its library for the code it created from your recording. When it finds it, it has found your song! But, it's possible for a developer that doesn't work for this company to implement something like this? The answer is yes and there are a lot of implementations of the algorithm that work pretty well.
In this tutorial, we'll explain you how to implement your own Shazam-like application to recognize songs from your own songs database with Python in Ubuntu 18.04.
Requirements
To follow perfectly this tutorial, you will need previously:
- You will need to work on an Ubuntu Desktop environment (18.04 in this case).
- An installed and working microphone (for a fully working example, the default script will recognize the audio from the microphone)
- Python 2 installed (
sudo apt install python2.7
) - pip (
sudo apt install python-pip
)
Having said that, let's get started !
1. Install required packages
To implement our own Shazam-like application, we will need to install all the dependencies that the project needs to work. As first step install the Ubuntu specific dependencies:
sudo apt-get install python-tk
sudo apt install ffmpeg
sudo apt-get install portaudio19-dev python-pyaudio
After installing the previous packages on your system, proceed with the installation of the packages required for Python.
Python Packages
In order to work properly, the python background will need a couple of important libraries with the following commands (pip needs to be installed):
pip install matplotlib termcolor scipy pydub PyAudio
These packages are:
- matplotlib: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
- termcolor: Termcolor is a library for printing colored messages to the terminal.
- scipy: is an open-source software for mathematics, science, and engineering.
- pydub: is a package to manipulate audio with an simple and easy high level interface.
- PyAudio: provides Python bindings for PortAudio, the cross platform audio API.
Once they're installed, you may proceed with the implementation of the project.
2. Clone audio-fingerprint-identifying-python project
In this implementation, we will use the open source project audio-fingerprint-identifying-python, available at Github. This project is a Shazam-similar app, that identify the song using audio fingerprints & spectrum analysis and Fast Fourier transform. The project was built by @itspoma, you can find more information about it here:
- conference PaceMaker: BackEnd-2016 conference
- slides are on slideshare.net/rodomansky/ok-shazam-la-lalalaa
Clone the repository in some directory with the following command:
git clone https://github.com/itspoma/audio-fingerprint-identifying-python.git
For more information about this project, please visit the official repository at Github here. After downloading the project, you will need to restore/create the database for first time running the following command:
make clean reset
The project uses SQLite to store the hashes extracted from the song files.
3. Store some MP3 files in the database directory
Just in the same cloned directory of the project, create a new folder namely mp3 inside the audio fingerprint directory. Inside this directory you will need to store the audio files that you want to have as reference in your database:
This means that your application will identify only the songs stored in this directory. Once you have them, you will be able to generate the fingerprint that we'll use later.
4. Generate MP3 files fingerprints
Now that you have your song database, you need to understand that the most important step of the algorithm is to determine some key points in the song, then it saves those points as a hash and then try to match on them against the SQLite database. We will proceed to generate those hashes and store them in the database with the following command:
python collect-fingerprints-of-songs.py
The algorithm will start analyzing every channel of the audio and will store the hashes in the database:
With this information, now you will be able to recognize the audio from the microphone or a file and see if it maches with any of the songs that you have in your database. In order to check which songs do you have in the database, you can run the following command:
python get-database-stat.py
And it will print in the terminal all the songs that you have:
5. Recognizing audio from microphone
Till the date, is only possible to recognize the audio from the microphone installed on your computer. To start recognizing, simply run the python script that is mean to be used with the microphone and provide as argument how many seconds it should recognize from now on:
python recognize-from-microphone.py -s 5
The minimum recommended amount of seconds is of 5, so the algorithm will surely recognize the correct song. After playing some song that is in our database from our mobile device to the microphone of the computer (and after 5 seconds) it will recognize the song succesfully:
And there you go, a very easy to implement project to identify songs like Shazam does.
Happy coding !