Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I found this answer - http://stackoverflow.com/a/7943464/1901367 - which gave me this really useful code allowing me to parse search strings that contained quotes and white space.

preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER);

I wondered if someone could tell me how to alter this code so that it would leave boolean operators such as + and - intact because the current code strips them out.

I want to do fulltext boolean searches of my database making use of those operators and I am confused by this REGEX which I don't understand.

Example input and output.

Input: '"this is some" text here is -more -"exlude me"' Output: [this is some] [text] [here] [is] [-more] [-exclude me] these would be in the $result array

So everything seperated by a space is an individual item unless it is a phrase enclosed in "". This works already but where I have -more and -"exlude me" the result currently would be [more] and [exclude me] losing the minus symbol which I want to keep.

Thanks in advance!

share|improve this question
1  
Please provide example inputs and your expected outputs for those inputs. –  Mark Byers Dec 16 '12 at 23:04
    
Done, I've added it to the question –  David Healey Dec 16 '12 at 23:20

2 Answers 2

You can use a simple regular expressions to yank out tokens, with the quotes and everything in them, and then pretty them up before you use them. Something like this:

function query_tokens($query)
{
    $regex = '/-?"[\pL\s]+"|-?\pL+/';

    preg_match_all($regex, $query, $tokens, PREG_SET_ORDER);

    foreach ($tokens as & $token)
    {
        $token = array_shift($token);

        $modifier = NULL;

        if ($token[0] === '-' || $token[0] === '+')
        {
            $modifier = $token[0];

            $token = substr($token, 1);
        }
        if ($token[0] === '"')
        {
            $token = trim($token, '"');
        }
        $token = $modifier.$token;
    }

    return $tokens;
}

The string used and result from the function:

var_dump(query_tokens('"this is some" text here is -more -"exlude me"'));
array (size=6)
  0 => string 'this is some' (length=12)
  1 => string 'text' (length=4)
  2 => string 'here' (length=4)
  3 => string 'is' (length=2)
  4 => string '-more' (length=5)
  5 => string '-exlude me' (length=10)

Regular expressions are great, but sometimes they can make things more complicated than they need to be.

share|improve this answer
    
Thanks, I'll try this out later –  David Healey Dec 17 '12 at 10:14

You cannot capture '-exclude me' using a regular expression because matches are always consecutive. At best, you could modify the regular expression to match the '-more' token as such:

(?<!")-?\b\w+\b|(?<=")\b[^"]+

share|improve this answer
    
Hmm I see, thanks for your answer –  David Healey Dec 17 '12 at 10:13

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.