Fuzzy searching is a very important feature of Web search engines. You can implement fuzzy text searching within your MySQL database by using a combination of built-in user functions like match against etc.
To match results even with mispelled input, we can use the SOUNDEX native MySQL function to catch them too. In this article, you'll learn how to implement the usage of Soundex for doctrine in a Symfony project.
Note: this tutorial works both for Symfony 2.x and Symfony 3.x.
Implementation
To get started, locate yourself in the root folder of your bundle and create a folder named Extensions (or the root directory /src
), then create a folder inside named Doctrine. Create inside the doctrine folder a new class named SoundexFunction.php
and save the following code inside.
Don't forget to change the namespace according to the location inside your project.
<?php
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
namespace ourcodeworld\Extensions\Doctrine;
use Doctrine\ORM\Query\AST\Functions\FunctionNode;
use Doctrine\ORM\Query\Lexer;
/**
* SoundexFunction ::= "SOUNDEX" "(" StringPrimary ")"
*/
class SoundexFunction extends FunctionNode
{
public $stringExpression = null;
public function parse(\Doctrine\ORM\Query\Parser $parser)
{
$parser->match(Lexer::T_IDENTIFIER);
$parser->match(Lexer::T_OPEN_PARENTHESIS);
$this->stringExpression = $parser->StringPrimary();
$parser->match(Lexer::T_CLOSE_PARENTHESIS);
}
public function getSql(\Doctrine\ORM\Query\SqlWalker $sqlWalker)
{
return 'SOUNDEX(' .
$this->stringExpression->dispatch($sqlWalker) .
')';
}
}
The class will allow you to use the SOUNDEX statement in our queries without throw any error like:
Now the class exists in our project, but it's not registered. In order to use it, we need to register the function in the config.yml
file of our project, just go the doctrine area and save the SOUNDEX
property with the class path in the dql property of the ORM.
# app/config/config.yml
# Doctrine Configuration
doctrine:
# Search for the ORM property
orm:
# Those values should be already in your file and this doesn't matter
auto_generate_proxy_classes: "%kernel.debug%"
naming_strategy: doctrine.orm.naming_strategy.underscore
auto_mapping: true
# We need this the dql property to register the custom doctrine functions :
dql:
string_functions:
# Match agains should have the path to the SOUNDEX class created in the previous step
SOUNDEX: myBundle\Extensions\Doctrine\SoundexFunction
And you're ready to go! clear the cache in case you're working in the prod environment and let's create some queries.
Usage
The SOUNDEX clause is available for it's use, you can use it within a query builder or plain DQL.
<?php
class LandingController extends Controller
{
public function indexAction(Request $request)
{
$em = $this->getDoctrine()->getManager();
$repository = $em->getRepository('myBundle:Entity');
$results = $repository->createQueryBuilder('a')
->where("SOUNDEX(a.table_field) LIKE SOUNDEX(:search)")
// To use SOUNDEX with another methods like MATCH_AGAINST
// You can use the orWhere('SOUN....') clause instead of where
// In case that you don't want to use parameter, you can set directly the string into the query
//->where("SOUNDEX(a.table_field) LIKE SOUNDEX('%Title to search mispillid%')")
->setParameter('search','%Title to search mispillid%')
->getQuery()
->getResult();
dump($results);
}
}
Note that we use %text%
to match all the records that contain that text. You don't need to use wildcards if you want.
Understanding how soundex works
Given the following table (with entity Breed
) in the database :
ID | NAME |
---|---|
1 | Affenpinscher |
2 | Alaskan Malamute |
3 | Airedale Terrier |
4 | Akita |
5 | Australian Shepherd |
The user is able to type wrong the dog breed, however a match will be returned i.e :
User input | Expected result |
---|---|
afhenpinsher | 1 - Affenpinscher |
alscan malamute | 2 - Alaskan Malamute |
Airdale terrier | 3 - Airedale Terrier |
aquita | 4 - Akita |
australian chepherd | 5 - Australian Shepherd |
What soundex does is look at the phonetics of a particular string and convert it to a standardized string (using this process letters and words that sound similar will return the same value, for instance, the letter 'd' will have the same value as the letter 't'). This doesn't sound very helpful at all, but it actually is, i.e if you execute the following query :
select 'text', soundex('text');
You'll get as result the following table :
text | SOUNDEX("text") |
text | T230 |
And if you execute text mispelled i.e tixt, then the table will be :
text | SOUNDEX("text") |
text | T230 |
Both strings return the same T230 value. Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English, SOUNDEX codes from different strings can be compared to see how similar the strings sound when spoken. The first character of the code is the first character of the expression, converted to upper case. The second through fourth characters of the code are numbers that represent the letters in the expression. The letters A, E, I, O, U, H, W, and Y are ignored unless they are the first letter of the string. All international alphabetic characters outside the A-Z range are treated as vowels. Hence, two strings that sound almost the same should have identical soundex strings. For instance, the words "text" and "tixt" both produce a soundex of “T230”.
Have fun !