XML parsing project in Java

Question

I'm writing this message because I would like to ask you some help to create a parser in java for the following XML :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<phyloxml xmlns='http://www.phyloxml.org'
          xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
          xsi:schemaLocation='http://www.phyloxml.org
                              http://www.phyloxml.org/1.10/phyloxml.xsd'>
  <phylogeny rooted='true'>
    <clade>
      <clade>
        <clade branch_length='4.25'>
          <clade branch_length='3.5'>
            <clade branch_length='3.5'>
              <name>B</name>
            </clade>
            <clade branch_length='3.5'>
              <name>C</name>
            </clade>
          </clade>
          <clade branch_length='7.0'>
            <name>D</name>
          </clade>
        </clade>
        <clade branch_length='10.25'>
          <clade branch_length='1.0'>
            <name>A</name>
          </clade>
          <clade branch_length='1.0'>
            <name>E</name>
          </clade>
        </clade>
      </clade>
    </clade>
    <name>description</name>
    <description />
  </phylogeny>
</phyloxml>

It's 3 days that I'm working on it and I didn't come up with anything working. I'm just at the beginning with java xml parsing and that's probably why I didn't do well. I need to have the name of the clades (es "A B C") in groups according to the tree structure (branch length)( from the smallest group to the biggest group). Therefore I should have an ArrayList with each element representing a group of names (es: A,B,C ...) according to the branch length. For eg. A, E is an element of the arraylist , {B C D} is another one ... {B C}, {B C D A E}. For this xml I should have an ArrayList like this : [{D} , {B , C} , {A , E} , {B C D} , { A E B C D}] . Can someone help me with the parsing? I would be really grateful for this.

Ps: In the example I'm using names that are strings but in the actual file I need to use numbers (id) instead of strings. Sorry for the indentation btw.

I did this and I'm stuck :

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;

public class JavaApplication4 {



   public static void main(String argv[]) {

    try {

    File fXmlFile = new File("C:/Users/GQ/workspace/UPGMA Algorithm/b.xml");
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);

    //optional, but recommended
    //read this - http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
    doc.getDocumentElement().normalize();

    System.out.println("Root element :" + doc.getDocumentElement().getNodeName());

    NodeList nList = doc.getElementsByTagName("clade");

    System.out.println("----------------------------");

    for (int temp = 0; temp < nList.getLength(); temp++) {

        Node nNode = nList.item(temp);

        System.out.println("\nCurrent Element :" + nNode.getNodeName());

        if (nNode.getNodeType() == Node.ELEMENT_NODE) {

            Element eElement = (Element) nNode;

            System.out.println("branch length : " + eElement.getAttribute("branch_length"));
            System.out.println("Name : " + eElement.getElementsByTagName("name").item(0).getTextContent());


        }
    }
    } catch (Exception e) {
    e.printStackTrace();
    }
  }

}

I don't know how to proceed to identify the clade names that are within a branch. Es I've got the branch with length 4,25 that has a clade name D , and another sub-branch with a clade name B and C. Can someone help me please??Thanks

This is the graphical representation of my xml file, just to make it clear what I wanna do. http://i39.tinypic.com/2lna89x.jpg

I have to store into an array list all the possible groups. They must be gruped by their branch layout. for this xml.file I have to have an arraylist as following {{A,E} ,{B,C},{D}, {B,C,D} , {A,E,B,CD}}

Why do you want to parse XML on your own? There are lots of XML parsers out there. You should probably look into a XQuery processor, as this will be a simple group-by query in XQuery. — dirkk, Jul 17 '13 at 10:47
There are even Java-based tools available at phyloxml.org to parse this format. look for "forester.jar". phyloxml.org — Thilo, Jul 17 '13 at 10:51
code.google.com/p/forester/source/browse/forester/java/src/org/… — Thilo, Jul 17 '13 at 10:53
Many thanks for the answers. I did something in java but I got stuck. What I need is just a simple arrayList containing the elements grouped by their branch length attribute(considering the hierarchy of the tree).I'm rushing with this because I have a deadline very close and it's 3 days that I'm stuck.By the way I try to avoid from code.google.com because I found many of them without any comments of the code and the libraries are rarely enclosed in the source. The phylogen tool is for dna/genes parsing,I need to do it with residue numbers of a protein interface(that's why I din't use that 1). — user2590983, Jul 17 '13 at 12:13

asked	6 months ago
viewed	132 times

Explore our sites

XML parsing project in Java

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Browse other questions tagged xml-parsing or ask your own question.

Community Bulletin

Linked

Hot Network Questions

Explore our sites

XML parsing project in Java

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or login

Post as a guest

Browse other questions tagged xml-parsing or ask your own question.

Community Bulletin

Linked

Related

Hot Network Questions