How to determine whether 2 images are equal or not with the perceptual hash in Python

How to determine whether 2 images are equal or not with the perceptual hash in Python

A perceptual hash, is a generated string (hash) that is produced by a special algorithm. This perceptual hash is a fingerprint based on some input picture, that can be used to compare images by calculating the Hamming distance (which basically counts the number of different individual bits). If you use another hashing technique for comparing images, making the slightest change to the picture, will generate a totally different hash (for example MD5 or SHA1).

In this article, we'll show you how to generate different versions of a perceptual hash from pictures in Python.

1. Download imagehash project

In order to compare 2 images and verify whether they are perceptually the same using a perceptual hash in Python, we will rely on the proposal of the imagehash project by @JohannesBuchner. This project is an image hashing library written in Python that supports:

  • average hashing (aHash)
  • perception hashing (pHash)
  • difference hashing (dHash)
  • wavelet hashing (wHash)

You can obtain the source code of this project with Git using the following command:

git clone https://github.com/JohannesBuchner/imagehash.git

After cloning it, you will be able to follow up the rest of the tutorial. For more information about this library, please visit the official repository at Github here.

2. Install dependencies

The imagehash project needs previously some dependencies to work properly, they can be installed easily with pip (if not installed, install pip with sudo apt install python-pip) and reading the requirements list of the project, so change of directory:

cd imagehash

And then install the dependencies with:

pip install -r conda-requirements.txt

The dependencies are:

  • six: Six provides simple utilities for wrapping over differences between Python 2 and Python 3. It is intended to support codebases that work on both Python 2 and 3 without modification. six consists of only one Python file, so it is painless to copy into a project.
  • Pillow: Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python Imaging Library by Fredrik Lundh and Contributors.
  • numpy: NumPy is the fundamental package for scientific computing with Python. It contains among other things.
  • scipy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • pywavelets: PyWavelets is open source wavelet transform software for Python. It combines a simple high level interface with low level C and Cython performance.

After the installation of the libraries, you will be able now to use it and generate the different hashes that this library offers.

3. Comparing images

You can follow these examples to generate the different perceptual hashes that can be generated with this Python project:

A. Average hash

The most easiest way to generate a perceptual hash and probably the one that you will choose, can be easily implemented as shown in the following example:

# example_averagehash.py

# Import dependencies
from PIL import Image
import imagehash

# Create the Hash Object of the first image
HDBatmanHash = imagehash.average_hash(Image.open('batman_hd.jpg'))
print('Batman HD Picture: ' + str(HDBatmanHash))

# Create the Hash Object of the second image
SDBatmanHash = imagehash.average_hash(Image.open('batman_sd.jpg'))
print('Batman HD Picture: ' + str(SDBatmanHash))

# Compare hashes to determine whether the pictures are the same or not
if(HDBatmanHash == SDBatmanHash):
    print("The pictures are perceptually the same !")
else:
    print("The pictures are different, distance: " + (HDBatmanHash - SDBatmanHash))

In this case, running the script with python example_averagehash.py will generate the following output in the terminal:

Batman HD Picture: 030f4f0f87070301
Batman HD Picture: 030f4f0f87070301
The pictures are perceptually the same !

As the pictures of batman are the same, just that the first image has a higher resolution than the first one, they will generate the same hash 030f4f0f87070301, although they are not the same file ! You can read a very detailed theorical explanation of how the average hash is generated in this blog.

B. perception hashing (pHash)

With the Perceptual Hash computation, that follow this implementation, can be used like this:

# example_phash.py

# Import dependencies
from PIL import Image
import imagehash

# Create the Hash Object of the first image
HDBatmanHash = imagehash.phash(Image.open('batman_hd.jpg'))
print('Batman HD Picture: ' + str(HDBatmanHash))

# Create the Hash Object of the second image
SDBatmanHash = imagehash.phash(Image.open('batman_sd.jpg'))
print('Batman HD Picture: ' + str(SDBatmanHash))

# Compare hashes to determine whether the pictures are the same or not
if(HDBatmanHash == SDBatmanHash):
    print("The pictures are perceptually the same !")
else:
    print("The pictures are different, distance: " + (HDBatmanHash - SDBatmanHash))

In this case, running the script with python example_phash.py will generate the following output in the terminal:

Batman HD Picture: a8d14ab75aa9c62b
Batman HD Picture: a8d14ab75aa9c62b
The pictures are perceptually the same !

C. difference hashing (dHash)

With the Difference Hash computation, that follow this implementation, can be used like this:

# example_dhash.py

# Import dependencies
from PIL import Image
import imagehash

# Create the Hash Object of the first image
HDBatmanHash = imagehash.dhash(Image.open('batman_hd.jpg'))
print('Batman HD Picture: ' + str(HDBatmanHash))

# Create the Hash Object of the second image
SDBatmanHash = imagehash.dhash(Image.open('batman_sd.jpg'))
print('Batman HD Picture: ' + str(SDBatmanHash))

# Compare hashes to determine whether the pictures are the same or not
if(HDBatmanHash == SDBatmanHash):
    print("The pictures are perceptually the same !")
else:
    print("The pictures are different, distance: " + (HDBatmanHash - SDBatmanHash))

In this case, running the script with python example_dhash.py will generate the following output in the terminal:

Batman HD Picture: bf9f97372c2ebbb3
Batman HD Picture: bf9f97372c2ebbb3
The pictures are perceptually the same !

D. wavelet hashing (wHash)

With the Wavelet Hash computation, that follow this implementation, can be used like this:

# example_whash.py

# Import dependencies
from PIL import Image
import imagehash

# Create the Hash Object of the first image
HDBatmanHash = imagehash.whash(Image.open('batman_hd.jpg'))
print('Batman HD Picture: ' + str(HDBatmanHash))

# Create the Hash Object of the second image
SDBatmanHash = imagehash.whash(Image.open('batman_sd.jpg'))
print('Batman HD Picture: ' + str(SDBatmanHash))

# Compare hashes to determine whether the pictures are the same or not
if(HDBatmanHash == SDBatmanHash):
    print("The pictures are perceptually the same !")
else:
    print("The pictures are different, distance: " + (HDBatmanHash - SDBatmanHash))

In this case, running the script with python example_whash.py will generate the following output in the terminal:

Batman HD Picture: 074fdfdf87070301
Batman HD Picture: 074fdfdf87070301
The pictures are perceptually the same !

Happy coding !

This could interest you

Become a more social person