2

I've got a pretty serious problem with XML Creation using standard java objects, my code is as follows:

//Generate DOM
DOMSource source = this.generateDomDocument(params...);

//WRITE XML FILE
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();

//Properties
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, STRING_FIELD_DTD);
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");

//Convert and write to disk
transformer.transform(source, new StreamResult(
                      new OutputStreamWriter(new FileOutputStream(fileName), "UTF-8")));

Problem is, the transformer is transforming Carriage Returns in 
 entities which I should not have in the resulting XML. This is an example, I have a result file with translations written in several different languages (that's why I use UTF-8) and they are all the same when they contain CarriageReturns in the text:

<content langID="EN">
                    <desc> Test string&#13;
do not copy.</desc>

To clear things up, this is what I expect in the XML:

<content langID="EN">
                    <desc> Test string
do not copy.</desc>

I looked up the issue on google and here too but there seems to be no solution or workaround.

2
  • I'm not sure, that I understand the problem correctly: The input data contains CRs and the XML output should a) not include them? or b) have simply a line break there? or c)... ? Commented Jul 3, 2012 at 20:02
  • The input data contains CRs and the XML output should simply have a line break and not something like &#13; then CRLF. I inspected the XML code with Notepad++ and this is what I found.
    – OverLex
    Commented Jul 4, 2012 at 9:53

1 Answer 1

0

After a lot of work I found two solutions to my own problem, they are more workarounds than real solutions:

Solution 1

Create a class that extends FilterOutputStream and implement the necessary methods to write every character on the final stream (a File in the case above) except for those unneeded, i.e. &#13; To write to stream just add the filter:

 StreamResult result = new StreamResult(
                           new OutputStreamWriter(
                               new XMLFilterOutputStream(
                                   new FileOutputStream(filename)),"UTF-8"));
 transformer.transform(source, result);

Solution 2

When creating the DOM tree just escape the \r character (thus removing every Carriage Return from the original text):

String util = //original string data

Element desc = doc.createElement("desc");                   
Node text = doc.createTextNode((util!=null ? stringEscape(util).trim() : ""));
desc.appendChild(text);
externalElement.appendChild(desc);

And having the escaping method done this way:

private String stringEscape(String str){
    StringBuffer st = new StringBuffer(str);
    for(int i=0; i < st.length();i++){
        String s = st.substring(i,i+1);
        if("\r".equals(s)){
            st.replace(i,i+1,"");   
        }
    }
    return st.toString();
}

I know, it's horrible, but works.

The correct solution, AFAIK, should be accessing the HTMLEntities file in XALAN and modify that one, therefore forcing the transformer to omit some entities.

1
  • 2
    For stringEscape it's probably much faster something like str.replaceAll("\r", "").
    – lapo
    Commented Jul 11, 2012 at 7:44

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.