PHP Boolean Text Search Parser
posted on June 28, 2013, 6:43 pm in
This is a simple PHP class that will parse a text query into a logical group of AND, OR, and EXACT searches. It works recursively and can display the query breakdown.
Complex queries can take the form of "(a or b) and (c or d)" and positional queries (EXACT) require the use of "..." (double quotes). Queries cannot be ambiguous, so "a and b or c and d" would not work, instead use "a and (b or (c and d))".
Examples
Example query 1)
a and ("hello people" or (c and d))s0 | hello people | Positional Search |
s1 | c and d | Intersection |
s2 | s0 or s1 | Union |
s3 | a and s2 | Intersection |
Example query 2)
( (abc and m and c) or (d or (e)) or g ) and (h and (i or j))s0 | abc and m and c | Intersection |
s1 | e | Retrieve |
s2 | i or j | Union |
s3 | d or s1 | Union |
s4 | h and s2 | Intersection |
s5 | s0 or s3 or g | Union |
s6 | s5 and s4 | Intersection |
The Code
Copy the the PHP class file and import it into your app:
<?php
/* File: PHP Boolean Text Search Parser
* Date: June 28, 2013
* License: MIT (http://opensource.org/licenses/MIT)
* Copyright (C) 2013 by http://codeeverywhere.ca
*/
class parser{
private $last;
private $set;
public function parse($q, $s = 0, $arr = null)
{
$q = stripcslashes($q);
//Replace Quotes
if( preg_match_all('/"[a-zA-Z0-9\s]+"/', $q, $matches) )
{
$replace = array();
$matches = $matches[0];
for($x=0; $x<count($matches); $x++)
{
$replace[$x] = "s$s";
$this->set["s$s"] = $matches[$x];
$s++;
$this->last = $s;
}
$q = str_replace($matches, $replace, $q);
}
$regex = '/(\(\s*[a-zA-Z0-9-]+\s*(\s(and|or)\s+[a-zA-Z0-9-]+\s*)*\))/';
preg_match_all($regex, $q, $matches);
$matches = $matches[1];
$replace = array();
for($x=0; $x<count($matches); $x++)
{
$replace[$x] = "s$s";
$this->set["s$s"] = $matches[$x];
$s++;
$this->last = $s;
}
$q = str_replace($matches, $replace, $q);
if( count($matches) == 0 )
{
$this->set["s$s"] = $q;
foreach($this->set as $key => $val)
{
$this->set[$key] = trim(str_replace(array(')','(', '"') , '', $val));
if(strpos($val,' and ') !== false and strpos($val,' or ') !== false)
return "Error - The query '$val' is ambiguous.";
}
return $this->set;
}
else
return $this->parse($q, $s, $arr);
}
public function display()
{
echo "<table border=\"1\" cellpadding=\"5\">";
foreach($this->set as $s => $q)
{
if(strpos($q,' and ') !== false)
$type = 'Intersection';
elseif(strpos($q,' or ') !== false)
$type = 'Union';
elseif(strpos($q,' ') !== false)
$type = 'Positional Search';
else
$type = 'Retrieve';
echo "<tr>";
echo "<th>$s</th><td>".$q."</td><td>$type</td>";
echo "</tr>";
}
echo "</table>";
}
}
Use the class:
<?php
$p = new parser();
echo $q = '( (abc and m and c) or (("banana" or "orange") or (e)) or g )
and ("fruit juice" and h and (i or j))';
$p->parse($q);
$p->display();
?>