php PCRE Regex optimization

Question

quite new to regexes i'm trying to optimize one, or at least know if there are better ways to do it.

Here is my input string:

$str = 'Some text
spanned on
several lines
txt_to_grab1 fixed_text1 txt_to_grab2
Full line to grab
txt_to_grab3 fixed_text2 txt_to_grab4
Some text after';

I'm trying to grab the lines from "txt_to_grab1" to "txt_to_grab4", but only the words "txt_to_grabX" and the line "Full line to grab".
I want to preserve everything untouched before and after (ie line breaks), but remove line breaks inside the lines i grab (as each line will be a <tr> that'll go into an html table).

Regex patterns/replace i found matching:

$find = "#(?<=\n)(.*?) fixed_text1 (.*?)(\n.*?\n)(.*?) fixed_text2 (.*?)(\n)#i";
$replace = '"$1" && "$2" grabbed.$3"$4" && "$5" grabbed.$6';   

$find = "#(.*)(?<=\n)(.*?) fixed_text1 (.*?)(\n)(.*)(?<=\n)(.*?) fixed_text2 (.*?)(\n.*)#is";
$replace = '$1"$2" && "$3" grabbed.$4$5"$6" && "$7" grabbed.$8';

Questions :

All questions can be sum up as : are there better/shorter/faster patterns ?

how to make the patterns work with either \r\n or \n ? I read somewhere on stack that (\r?\n) would be a solution, but i dunno how to use them in lookbehinds. For example the following patterns work, but i don't like them (dirty as only \n are used in lookbehinds, may produce unexpected results):
```
"#(?<=\n)(.*?) fixed_text1 (.*?)(\r?\n.*?\r?\n)(.*?) fixed_text2 (.*?)(\r?\n)#i"
"#(.*)(?<=\n)(.*?) fixed_text1 (.*?)(\r?\n)(.*)(?<=\n)(.*?) fixed_text2 (.*?)(\r?\n.*)#is";
```
even better, how to use the "s" modifier to remove all line breaks from the pattern, so being able to use (.*?) but still grabbing what i want ? Word boundaries ?
is the multiline mode (m modifier) useful/helpful here ?

I'd really like the regexes to be explained, if you provide some :)

Don't worry about faster unless you know it's a bottleneck. As far as better or shorter, that's fine to fix now. — ircmaxell, Nov 22 '10 at 15:25

Alan Moore · Accepted Answer · 2010-11-22 22:59:45Z

up vote 1 down vote accepted

You don't need lookbehinds for this. Just use the start-of-line anchor at the beginning of your regex and the end-of-line anchor at the end (that's ^ and $ in multiline mode). To match the line separators in the middle you can use (?:\r\n|[\r\n]), a common idiom for the three most common styles of line separator: \n, \r, or \r\n.

As for the s modifier (a.k.a. "single-line" or "DOT_ALL"), you don't need that either. All it does is allow the dot metacharacter to match line separators as well as all other characters, which doesn't do you any good. You want it to stop matching when it reaches line breaks, so you can exclude them from your captures.

Here's a demo:

$pattern='#^(.*?) fixed_text1 (.*)(?:\r\n|[\r\n])(.*)(?:\r\n|[\r\n])(.*?) fixed_text2 (.*)$#im';

preg_match($pattern, $source, $m);

echo "$m[1] && $m[2] grabbed.\n";
echo "$m[3]\n";
echo "$m[4] && $m[5] grabbed.\n";

output:

txt_to_grab1 && txt_to_grab2 grabbed.
Full line to grab
txt_to_grab3 && txt_to_grab4 grabbed.

See it in action on ideone.com

edited Nov 22 '10 at 22:59

answered Nov 22 '10 at 19:17

Alan Moore
37.7k33669

Damn, that looks better likes this, thanks for the answer ! I was using preg_replace with references, is preg_match faster ? – zithro Nov 22 '10 at 19:44

Speed isn't the issue. To get get the same result with preg_replace, you would have to match all of the text before and after the part you're interested in, just so you could throw it away (i.e., not reference it in the replacement string). – Alan Moore Nov 22 '10 at 19:58

Edit: the script doesn't work as is on my comp, var_dump(preg_match) returns 0. I add to replace all "\R" by "\r?\n", any ideas why ? (edit : thx for the addition, tried preg_replace and that works as well) – zithro Nov 22 '10 at 20:01

@zithro, if Alan's answer wolves your problem, please mark it as the "accepted" answer. – Bart Kiers Nov 22 '10 at 20:32

Did you enclose the regex in single-quotes, or double-quotes? I tried it with double-quotes and it didn't work. The PHP interpreter is a little too lenient about such things IMO; it's probably throwing away the backslash. I always use single-quotes for my regexes to avoid that kind of problem. – Alan Moore Nov 22 '10 at 21:27

show 3 more comments

asked	3 years ago
viewed	349 times
active	3 years ago

current community

your communities

more stack exchange communities

php PCRE Regex optimization

Questions :

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged php regex pcre or ask your own question.

Community Bulletin

Hot Network Questions

current community

your communities

more stack exchange communities

php PCRE Regex optimization

Questions :

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged php regex pcre or ask your own question.

Community Bulletin

Related

Hot Network Questions