Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm trying to use regex to replace source attribute (could be image or any tag) in PHP.

I've a string like this:

$string2 = "<html><body><img src = 'images/test.jpg' /><img src = 'http://test.com/images/test3.jpg'/><video controls="controls" src='../videos/movie.ogg'></video></body></html>";

And I would like to turn it into:

$string2 = "<html><body><img src = 'test.jpg' /><img src = 'test3.jpg'/><video controls="controls" src='movie.ogg'></video></body></html>";

Heres what I tried :

$string2 = preg_replace("/src=["']([/])(.*)?["'] /", "'src=' . convert_url('$1') . ')'" , $string2);
echo htmlentities ($string2);

Basically it didn't change anything and gave me a warning about unescaped string.

Doesn't $1 send the content of the string ? What is wrong here ?

And the function of convert_url is from an example I posted here before :

function convert_url($url)
{
    if (preg_match('#^https?://#', $url)) {
        $url = parse_url($url, PHP_URL_PATH);
    }
    return basename($url);
}

It's supposed to strip out url paths and just return the filename.

share|improve this question
the original string and what you want to turn it into are both empty strings -- is something missing? – ametren May 18 '12 at 19:04
try to replace the " with \" – Mageek May 18 '12 at 19:06
Just edited it back. – Ashesh May 18 '12 at 19:07
1  
You really shouldn't parse HTML with regex. You should find a pretty comprehensive answer as to why if you search SO. In the meantime, may I suggest DOM or SimpleXML – GordonM May 18 '12 at 20:34
i mean try to replace in the regex all the " into \" but not the first and the last – Mageek May 19 '12 at 4:02
show 2 more comments

2 Answers

up vote 0 down vote accepted

You have to use the e modifier.

$string = "<html><body><img src='images/test.jpg' /><img src='http://test.com/images/test3.jpg'/><video controls=\"controls\" src='../videos/movie.ogg'></video></body></html>";

$string2 = preg_replace("~src=[']([^']+)[']~e", '"src=\'" . convert_url("$1") . "\'"', $string);

Note that when using the e modifier, the replacement script fragment needs to be a string to prevent it from being interpreted before the call to preg_replace.

share|improve this answer

Don't use regular expressions on HTML - use the DOMDocument class.

$html = "<html>
           <body>
             <img src='images/test.jpg' />
             <img src='http://test.com/images/test3.jpg'/>
             <video controls='controls' src='../videos/movie.ogg'></video>
           </body>
         </html>";

$dom = new DOMDocument;  
libxml_use_internal_errors(true);

$dom->loadHTML( $html ); 
$xpath = new DOMXPath( $dom );
libxml_clear_errors();

$doc = $dom->getElementsByTagName("html")->item(0);
$src = $xpath->query(".//@src");

foreach ( $src as $s ) {
  $s->nodeValue = array_pop( explode( "/", $s->nodeValue ) );
}

$output = $dom->saveXML( $doc );

echo $output;

Which outputs the following:

<html>
  <body>
    <img src="test.jpg">
    <img src="test3.jpg">
    <video controls="controls" src="movie.ogg"></video>
  </body>
</html>
share|improve this answer
The dom document class is not very helpful if it is html embedded inside another HTML tag like <script></script> for e.g. – Ashesh May 18 '12 at 19:08
@Ashesh I'm not I follow. You showed us PHP code - I'm showing you the solution. – Jonathan Sampson May 18 '12 at 19:11
Well I'm sorry I should have been more clear. Here's what I'm talking about: "<html><head><script>var html = '<img src = /images/test.jpg/>'</script></head><body></html>". In this case, the domdocument would not pickup on the image tag inside the javascript. That's why I need to use regex. – Ashesh May 18 '12 at 19:13
@Ashesh The code above will work on the PHP string you have provided here. It converts the src elements to point only to the filename. – Jonathan Sampson May 18 '12 at 19:29
Sometimes it's not a good idea to load HTML parser. Especcialy on a short predefined text values (e.g. <img alt="smth" src="smwhr"/>), where only src="" and alt="" could vary. – BasTaller Jun 19 '12 at 14:40

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.