Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am new to android,In my application i have to parse the data and i need to display in screen.But in one particular tag data i can't able to parse why because some special character also coming inside that tag.Here below i display my code.

My parser function:

  protected ArrayList<String> doInBackground(Context... params) 
    {
//      context = params[0];
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();     
        test = new ArrayList<String>();
        try {
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document document = builder.parse(new java.net.URL("input URL_confidential").openConnection().getInputStream());
            //Document document = builder.parse(new URL("http://www.gamestar.de/rss/gamestar.rss").openConnection().getInputStream());
            Element root = document.getDocumentElement();
            NodeList docItems = root.getElementsByTagName("item");
            Node nodeItem;
            for(int i = 0;i<docItems.getLength();i++)
            {
                nodeItem = docItems.item(i);
                if(nodeItem.getNodeType() == Node.ELEMENT_NODE)
                {
                    NodeList element = nodeItem.getChildNodes();                    
                    Element entry = (Element) docItems.item(i);
                    name=(element.item(0).getFirstChild().getNodeValue());




//                 System.out.println("description = "+element.item(2).getFirstChild().getNodeValue().replaceAll("&lt;div&gt;&lt;p&gt;"," "));
                    System.out.println("Description"+Jsoup.clean(org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(element.item(2).getFirstChild().getNodeValue()), new Whitelist()));             


                    items.add(name);


                }
            }
        } 
        catch (ParserConfigurationException e) 
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        catch (MalformedURLException e)
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        catch (SAXException e)
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        catch (IOException e)
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        return items;
    }

Input:

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>my application</title>
<link>http:// some link</link>
<atom:link href="http:// XXXXXXXX" rel="self"></atom:link>
<language>en-us</language>
<lastBuildDate>Thu, 20 Dec 2012</lastBuildDate>
<item>
<title>lllegal settlements</title>
<link>http://XXXXXXXXXXXXXXXX</link>
<description> &lt;div&gt;&lt;p&gt;
India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel’s announcement of new construction activity in Palestinian territories and demand immediate dismantling of the “illegal†settlements.
&lt;/p&gt;
&lt;p&gt;
UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel “gravely threatens efforts to establish a viable Palestinian state.â€
&lt;/p&gt;
&lt;p&gt;
</description>
</item>
</channel>

Output:

 lllegal settlements  ----> title tag text

     India was joined by all members of the 15-nation UN Security Council except the US to condemn Israel announcement of new construction activity in Palestinian territories and demand immediate dismantling of the illegal settlements. -----> description tag text

     UN Secretary General Ban Ki-moon also expressed his deep concern by the heightened settlement activity in West Bank, saying the move by Israel gravely threatens efforts to establish a viable Palestinian state.    ----> description tag text.
share|improve this question
Post your xml response. – Dipak Keshariya Dec 19 '12 at 10:10
my xml response: – neha88 Dec 19 '12 at 10:17
<description> &lt;div&gt;&lt;p&gt; An independent inquiry into the September 11 attack on the US Consulate in Benghazi that killed the US ambassador to Libya and three other Americans has found that systematic failures at the State Department led to “grossly†inadequate security at the mission. &lt;/p&gt;</description> – neha88 Dec 19 '12 at 10:17

2 Answers

Your text node contains both escaped HTML entities (&gt; is >, greater then) and garbage characters (“grosslyâ€). You should first adjust the encoding according to your input source, then you can unescape the HTML with Apache Commons Lang StringUtils.escapeHtml4(String).

This method (hopefully) returns an XML which you can query (for example with XPath) to extract the wanted text node, or you can give the whole string to JSOUP or to the Android Html class

// JSOUP, "html" is the unescaped string. Returns a string
Jsoup.parse(html).text();

// Android
android.text.Html.fromHtml(instruction).toString()

Test program (JSOUP and Commons-Lang required)

package stackoverflow;

import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;

public class EmbeddedHTML {

    public static void main(String[] args) {
        String src = "<description> &lt;div&gt;&lt;p&gt; An independent" +
                " inquiry into the September 11 attack on the US Consulate" +
                " in Benghazi that killed the US ambassador to Libya and" +
                " three other Americans has found that systematic failures" +
                " at the State Department led to “grossly†inadequate" +
                " security at the mission. &lt;/p&gt;</description>";
        String unescaped = StringEscapeUtils.unescapeHtml4(src);
        System.out.println(Jsoup.clean(unescaped, new Whitelist()));
    }

}
share|improve this answer
By using this "element.item(2).getFirstChild().getNodeValue():" – neha88 Dec 19 '12 at 10:44
i can able to go corresponding node,but i cant get the value in that – neha88 Dec 19 '12 at 10:45
One more thing i don't have the control to change the input (i.e) Xml file – neha88 Dec 19 '12 at 10:47
You don't have to change anything. I can't understand why you think you got the right node if you are not able to see its content... – Raffaele Dec 19 '12 at 10:47
why because in that tag some special character are coming..,that the issue raffaele. – neha88 Dec 19 '12 at 10:50
show 21 more comments

Is there anything wrong with simply replacing the offending characters?

string = string.replaceAll("&lt;", "");
string = string.replaceAll("div&gt;", "");
string = string.replaceAll("p&gt;", "");
share|improve this answer
Thanks Aelexe. Even i cant able to get the data..I tried with above my code it not displaying anything.I have the problem to extract the data.Once if i extract i can do this replaceall() method. – neha88 Dec 19 '12 at 10:41

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.