I'm writing this message because I would like to ask you some help to create a parser in java for the following XML :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<phyloxml xmlns='http://www.phyloxml.org'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://www.phyloxml.org
http://www.phyloxml.org/1.10/phyloxml.xsd'>
<phylogeny rooted='true'>
<clade>
<clade>
<clade branch_length='4.25'>
<clade branch_length='3.5'>
<clade branch_length='3.5'>
<name>B</name>
</clade>
<clade branch_length='3.5'>
<name>C</name>
</clade>
</clade>
<clade branch_length='7.0'>
<name>D</name>
</clade>
</clade>
<clade branch_length='10.25'>
<clade branch_length='1.0'>
<name>A</name>
</clade>
<clade branch_length='1.0'>
<name>E</name>
</clade>
</clade>
</clade>
</clade>
<name>description</name>
<description />
</phylogeny>
</phyloxml>
It's 3 days that I'm working on it and I didn't come up with anything working. I'm just at the beginning with java xml parsing and that's probably why I didn't do well. I need to have the name of the clades (es "A B C") in groups according to the tree structure (branch length)( from the smallest group to the biggest group). Therefore I should have an ArrayList with each element representing a group of names (es: A,B,C ...) according to the branch length. For eg. A, E is an element of the arraylist , {B C D} is another one ... {B C}, {B C D A E}. For this xml I should have an ArrayList like this : [{D} , {B , C} , {A , E} , {B C D} , { A E B C D}] . Can someone help me with the parsing? I would be really grateful for this.
Ps: In the example I'm using names that are strings but in the actual file I need to use numbers (id) instead of strings. Sorry for the indentation btw.
I did this and I'm stuck :
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class JavaApplication4 {
public static void main(String argv[]) {
try {
File fXmlFile = new File("C:/Users/GQ/workspace/UPGMA Algorithm/b.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
//optional, but recommended
//read this - http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("clade");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("branch length : " + eElement.getAttribute("branch_length"));
System.out.println("Name : " + eElement.getElementsByTagName("name").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
I don't know how to proceed to identify the clade names that are within a branch. Es I've got the branch with length 4,25 that has a clade name D , and another sub-branch with a clade name B and C. Can someone help me please??Thanks
This is the graphical representation of my xml file, just to make it clear what I wanna do. http://i39.tinypic.com/2lna89x.jpg
I have to store into an array list all the possible groups. They must be gruped by their branch layout. for this xml.file I have to have an arraylist as following {{A,E} ,{B,C},{D}, {B,C,D} , {A,E,B,CD}}