up vote 0 down vote favorite

I'm writing an RSS to JSON parser and as a part of that, I need to use htmlentities() on any tag found inside the description tag. Currently, I'm trying to use preg_replace(), but I'm struggling a little with it. My current (non-working) code looks like:

$pattern[0] = "/\<description\>(.*?)\<\/description\>/is";
$replace[0] = '<description>'.htmlentities("$1").'</description>';
$rawFeed = preg_replace($pattern, $replace, $rawFeed);

If you have a more elegant solution to this as well, please share. Thanks.

flag

2 Answers

up vote 3 down vote accepted

Simple. Use preg_replace_callback:

function _handle_match($match)
{
    return '<description>' . htmlentities($match[1]) . '</description>';
}

$pattern = "/\<description\>(.*?)\<\/description\>/is";
$rawFeed = preg_replace_callback($pattern, '_handle_match', $rawFeed);

It accepts any callback type, so also methods in classes.

link|flag
That did the trick, thank you. – VirtuosiMedia Sep 24 '08 at 17:07
up vote 0 down vote

The more elegant solution would be to employ SimpleXML. Or a third party library such as XML_Feed_Parser or Zend_Feed to parse the feed.

Here is a SimpleXML example:

<?php
$rss = file_get_contents('http://rss.slashdot.org/Slashdot/slashdot');
$xml = simplexml_load_string($rss);

foreach ($xml->item as $item) {
    echo "{$item->description}\n\n";
}
?>

Keep in mind that RSS and RDF and Atom look different, which is why it can make sense to employ one of the above libraries I mentioned.

link|flag
I am actually using simpleXML, but the problem is that any embedded HTML inside the description tag also becomes an object, which is why I am entity encoding it first. – VirtuosiMedia Sep 24 '08 at 17:16
Your feed is broken then. Good feeds wrap HTML and similar in CDATA. – Till Sep 24 '08 at 18:36
When I said "good", I meant "valid". :) – Till Sep 24 '08 at 18:37

Your Answer

get an OpenID
or
never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.