How to implement a fuzzy search (fulltext) engine using TNTSearch in Symfony 5

How to implement a fuzzy search (fulltext) engine using TNTSearch in Symfony 5

If your Symfony 5 application uses MySQL as the default database manager, you know how difficult it is to develop a fuzzy search feature in your database and even worst when you are working with Doctrine. The best and most effective solution would be to migrate the data that you need to search for to elasticsearch, however if you don't want to implement new technologies in your project you may stick with the default and good known ones. Just like there are tools to make PDF's natively with just PHP (like TCPDF), for the implementation of fuzzy searching in your PHP project exists TNTSearch.

TNTSearch is a fully featured full text search engine written entirely in PHP. Its simple configuration allows you to add an amazing search experience to your site in just minutes. It has also a build in geo-search and a text classifier. The most known features of this project are:

  • fuzzy search
  • as you type functionality
  • geo-search
  • text-classification
  • stemming
  • custom tokenizers
  • bm25 ranking algorithm (yeaah very important scientific stuff that we may ignore initially)
  • boolean search
  • result highlighting

In this article, we will explain you how to implement this fuzzy searching feature in your application 

1. Install TNTSearch

Just as Symfony 5, the TNTSearch library requires the following extensions to be installed on your system:

  • PHP >= 7.1
  • PDO PHP Extension
  • SQLite PHP Extension
  • mbstring PHP Extension

During the installation, composer will verify anyway if the mentioned extensions are available on the installed PHP version and will install it or not. Open a terminal, switch to the root directory of your symfony project and install the library with the following command:

composer require teamtnt/tntsearch

The package has bunch of helper functions like jaro-winkler and cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. After the installation of the package, we will be able to proceed with the creation of our fuzzy search engine.

For more information about this project, visit the official repository at Github here.

2. Create fuzzy search index file

Now to get started, you need to understand basically how the library functionality works. As first, you will need to create an index file with all the data of your database that you want to expose to be fuzzy searched. For example, in this article we will have a single table in our MySQL database that corresponds to a collection of world wide known musicians, we can see the table through PHPMyAdmin for example:

Artists Table MySQL Database FuzzySearch

Now that we know what our index will contain, we will start writing some code in our project to generate it. Esentially, the TNTSearch library requires direct connection to your MySQL database as shown in the following example:

use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;

$tnt->loadConfig([
    'driver'    => 'mysql',
    'host'      => 'localhost',
    'database'  => 'dbname',
    'username'  => 'user',
    'password'  => 'pass',
    'storage'   => '/var/www/tntsearch/examples/',
    'stemmer'   => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional
]);

However, as you are working with a Symfony 5 project, you know that the database configuration is stored in the .env file in the root directory of your project and the line looks like this:

DATABASE_URL=mysql://db_user:db_password@127.0.0.1:3306/db_name?serverVersion=5.7

So, in order to obtain the information in Symfony, we will create the following method (this tutorial will be followed inside a single controller, just for learning purposes, you are free to modify the logic either exposing the generation of the index through a command in case that it doesn't need to be modified constantly):

<?php

// src/Controller/DefaultController.php
namespace App\Controller;

use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Routing\Annotation\Route;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;

class DefaultController extends AbstractController
{    
    /**
     * Returns an array with the configuration of TNTSearch with the
     * database used by the Symfony project.
     * 
     * @return type
     */
    private function getTNTSearchConfiguration(){
        $databaseURL = $_ENV['DATABASE_URL'];
        
        $databaseParameters = parse_url($databaseURL);
        
        $config = [
            'driver'    => $databaseParameters["scheme"],
            'host'      => $databaseParameters["host"],
            'database'  => str_replace("/", "", $databaseParameters["path"]),
            'username'  => $databaseParameters["user"],
            'password'  => $databaseParameters["pass"],
            // In Windows:
            // C:\\xampp738\\htdocs\\myproject/fuzzy_storage
            // Or Linux:
            // /var/www/vhosts/myproject/fuzzy_storage
            // 
            // Create the fuzzy_storage directory in your project to store the index file
            'storage'   => '/var/www/vhosts/myproject/fuzzy_storage/',
            // A stemmer is optional
            'stemmer'   => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class
        ];
        
        return $config;
    }
}

As the method explains, it will return the array with the configuration of the database that we will use to create the index with TNTSearch. We mentioned as well that you need to create in this tutorial, the fuzzy_storage directory in the root directory of your project as we will save the indexes in there, however you are free to change it according to your needs. Having now the configuration object, we will proceed with the creation of our first route in the controller, that will allow you to create the index file. Our route will be /generate-index and will have the following code (we will omit the getTNTSearchConfiguration method in our examples, however it should exist):

<?php

// src/Controller/DefaultController.php
namespace App\Controller;

use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Routing\Annotation\Route;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;

// Import TNTSearch
use TeamTNT\TNTSearch\TNTSearch;

class DefaultController extends AbstractController
{
    /**
     * @Route("/generate-index", name="app_generate-index")
     */
    public function generate_index()
    {
        $tnt = new TNTSearch;

        // Obtain and load the configuration that can be generated with the previous described method
        $configuration = $this->getTNTSearchConfiguration();
        $tnt->loadConfig($configuration);

        // The index file will have the following name, feel free to change it as you want
        $indexer = $tnt->createIndex('artists.index');
        
        // The result with all the rows of the query will be the data
        // that the engine will use to search, in our case we only want 2 columns
        // (note that the primary key needs to be included)
        $indexer->query('SELECT id, name, slug FROM artists;');
        
        // Generate index file !
        $indexer->run();

        return new Response(
            '<html><body>Index succesfully generated !</body></html>'
        );
    }

    /// ... ///
}

As result visiting the project in the web browser http://app/generate-index will return the following output (as we have 7400 rows in the table):

Processed 1000 rows 
Processed 2000 rows 
Processed 3000 rows 
Processed 4000 rows 
Processed 5000 rows 
Processed 6000 rows 
Processed 7000 rows 
Total rows 7435 Index succesfully generated !

And in the mentioned directory you will find the index file that can be used to fuzzy search the data:

TNTSearch Index File Symfony 5

Now that we have the index, we can start fuzzy searching.

3. Fuzzy searching

We're almost there! As mentioned, now with the index, we can easily search for whatever the user is looking for. In our example controller, we will have a /search route that will display the results returned by the search method of TNTSearch. This can be achieved by simply initializing a TNTSearch instance with your configuration, selecting the index file with the selectIndex method and then running the search method from our instance. This method expects as arguments the search string and as second parameter the limit of results which is by default 100. 

We will return the result as a JSON string as described in the following controller:

<?php

// src/Controller/DefaultController.php
namespace App\Controller;

use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Routing\Annotation\Route;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;

// Import TNTSearch
use TeamTNT\TNTSearch\TNTSearch;
use Symfony\Component\HttpFoundation\JsonResponse;

class DefaultController extends AbstractController
{
    /**
     * @Route("/search", name="app_search")
     */
    public function search()
    {
        $tnt = new TNTSearch;

        // Obtain and load the configuration that can be generated with the previous described method
        $configuration = $this->getTNTSearchConfiguration();
        $tnt->loadConfig($configuration);
        
        // Use the generated index in the previous step
        $tnt->selectIndex('artists.index');
        
        $maxResults = 20;
        
        // Search for a band named like "Guns n' roses"
        $results = $tnt->search("Gans n rosas", $maxResults);
        
        return new JsonResponse($results);
    } 
    
    // ... //
}

The response of this controller will be an array returned by the search method of TNTSearch that has the following structure:

{
  "ids": [
    946,
    2990,
    4913,
    5564,
    5751,
    1924,
    4794,
    5541,
    5560,
    5725,
    5757,
    6581,
    7370
  ],
  "hits": 13,
  "execution_time": "1.087 ms"
}

As you can see, TNTSearch returns only a collection of the primary key of the rows in the table of the database that best match the search criteria. Normally, in a Symfony project you will be using Doctrine to manipulate the data from your database, so you will need to search the items by its primary key through Doctrine. For example, in our project we have an entity with all the fields of the database identified as:

<?php

// /src/Entity/Artists.php
namespace App\Entity;

use Doctrine\ORM\Mapping as ORM;

/**
 * Artists
 *
 * @ORM\Table(name="artists", indexes={@ORM\Index(name="first_character", columns={"first_character"})})
 * @ORM\Entity
 */
class Artists
{
    /**
     * @var int
     *
     * @ORM\Column(name="id", type="bigint", nullable=false)
     * @ORM\Id
     * @ORM\GeneratedValue(strategy="IDENTITY")
     */
    private $id;

    /**
     * @var string
     *
     * @ORM\Column(name="name", type="string", length=255, nullable=false)
     */
    private $name;

    /**
     * @var string
     *
     * @ORM\Column(name="slug", type="string", length=255, nullable=false)
     */
    private $slug;

    /**
     * @var string|null
     *
     * @ORM\Column(name="first_character", type="string", length=1, nullable=true, options={"default"="NULL"})
     */
    private $firstCharacter = 'NULL';

    /**
     * @var string|null
     *
     * @ORM\Column(name="description", type="text", length=65535, nullable=true, options={"default"="NULL"})
     */
    private $description = 'NULL';

    /**
     * @var string|null
     *
     * @ORM\Column(name="image", type="string", length=255, nullable=true, options={"default"="NULL"})
     */
    private $image = 'NULL';

    public function getId(): ?string
    {
        return $this->id;
    }

    public function getName(): ?string
    {
        return $this->name;
    }

    public function setName(string $name): self
    {
        $this->name = $name;

        return $this;
    }

    public function getSlug(): ?string
    {
        return $this->slug;
    }

    public function setSlug(string $slug): self
    {
        $this->slug = $slug;

        return $this;
    }

    public function getFirstCharacter(): ?string
    {
        return $this->firstCharacter;
    }

    public function setFirstCharacter(?string $firstCharacter): self
    {
        $this->firstCharacter = $firstCharacter;

        return $this;
    }

    public function getDescription(): ?string
    {
        return $this->description;
    }

    public function setDescription(?string $description): self
    {
        $this->description = $description;

        return $this;
    }

    public function getImage(): ?string
    {
        return $this->image;
    }

    public function setImage(?string $image): self
    {
        $this->image = $image;

        return $this;
    }
}

So we could simply search for every item through the repository by its id. It's pretty important to keep in mind that TNTSearch returns the results by the best matches first, so that's why we search for every single item and store them into an array like described in the following code:

<?php

// src/Controller/DefaultController.php
namespace App\Controller;

use Symfony\Component\HttpFoundation\Response;
use Symfony\Component\Routing\Annotation\Route;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;

// Import TNTSearch
use TeamTNT\TNTSearch\TNTSearch;
use Symfony\Component\HttpFoundation\JsonResponse;
use App\Entity\Artists;

class DefaultController extends AbstractController
{
    /**
     * @Route("/search", name="app_search")
     */
    public function search()
    {
        $em = $this->getDoctrine()->getManager();
        
        $tnt = new TNTSearch;

        // Obtain and load the configuration that can be generated with the previous described method
        $configuration = $this->getTNTSearchConfiguration();
        $tnt->loadConfig($configuration);
        
        // Use the generated index in the previous step
        $tnt->selectIndex('artists.index');
        
        $maxResults = 20;
        
        // Search for a band named like "Guns n' roses"
        $results = $tnt->search("Gans n rosas", $maxResults);
        
        // Keep a reference to the Doctrine repository of artists
        $artistsRepository = $em->getRepository(Artists::class);
        
        // Store the results in an array
        $rows = [];
        
        foreach($results["ids"] as $id){
            $artist = $artistsRepository->find($id);
            
            $rows[] = [
                'id' => $artist->getId(),
                'name' => $artist->getName()
            ];
        }
        
        // Return the results to the user
        return new JsonResponse($rows);
    }
}

Then in our search route, we will have a response like the following one:

[
  {
    "id": "946",
    "name": "Black N Blue"
  },
  {
    "id": "2990",
    "name": "Guns N' Roses"
  },
  {
    "id": "4913",
    "name": "Nigrino, N"
  },
  {
    "id": "5564",
    "name": "Rage N Rox"
  },
  {
    "id": "5751",
    "name": "Robin N Looza"
  },
  {
    "id": "1924",
    "name": "Demon n'Angel"
  },
  {
    "id": "4794",
    "name": "N.E.R.D"
  },
  {
    "id": "5541",
    "name": "R.I.S.E.N."
  },
  {
    "id": "5560",
    "name": "Rag'n'Bone Man"
  },
  {
    "id": "5725",
    "name": "Rínon Nínqueon"
  },
  {
    "id": "5757",
    "name": "Rock'n'Roll Soldiers"
  },
  {
    "id": "6581",
    "name": "T.N.F."
  },
  {
    "id": "7370",
    "name": "Youssou N'Dour feat. Neneh Cherry"
  }
]

As expected, even when we wrote Guns 'n roses in the wrong way, in our top results the correct band was in the list! Thanks to this library you will be implementing fuzzy searching in your project within minutes and just using PHP.

Happy coding !

This could interest you

Become a more social person