2

Greetings!

Is it possible to convert an HTML string to an array or JSON using Javascript?

Something like this:

var stringweb = '<html><head>hi</head><body>my body</body></html>';

And as result, I can have this:

var myarray = {[html,
                  [head,
                     [hi]
                  ]
                 [etc...]
                ]}

Thanks in advance! :)

10
  • HTML, or XML, can't be represented (generally - the data they contain can, sometimes) as nested lists. At best, as nested objects with a list of children objects and a dictionary of attributes, but even that is very minimalistic and propably oversimplified some issues. Commented Jan 5, 2011 at 12:38
  • @delnan - in that (odd) format, it does look like the markup is retained, just in a different way. Commented Jan 5, 2011 at 12:40
  • @Pointy: But it doesn't preserve attributes. Commented Jan 5, 2011 at 12:41
  • should those values be strings? array = {["html", ...? or nodes somehow? Commented Jan 5, 2011 at 12:42
  • @delnan well we don't know that it doesn't preserve attributes; there are no attributes in the markup example supplied. Commented Jan 5, 2011 at 12:43

2 Answers 2

2

As you can tell from the comments above, this doesn't seem like the most robust idea... Anyhow, here is a solution that I think gets you what you asked for. It was fun to write, anyhow.

function htmlStringToArray(str) {
  var temp = document.createElement('iframe');
  temp.style.display = "none";
  document.body.appendChild(temp);
  var doc = temp.contentWindow.document;
  doc.open();
  doc.write(str);
  doc.close();

  var array = htmlNodeToArray(doc.documentElement);
  temp.parentNode.removeChild(temp);
  return array;
}

function htmlNodeToArray(node) {
  if (node.nodeType == 1) {
    var array = [node.tagName];
    if (node.childNodes.length) {
      for (var i=0, child; child = node.childNodes[i]; i++) {
        if (child.nodeType == 1 || child.nodeType == 3) {
          array.push(htmlNodeToArray(child));
        }     
      }
    } else if (node.innerText) {
      array.push([node.innerText]);
    }
    return array;

  } else if (node.nodeType == 3) {
    return [node.nodeValue];
  }
}

I tried it out in the latest chrome, firefox and IE. Here it is running on jsbin: http://jsbin.com/uqize3/7/edit

BTW your HTML string is invalid. Browsers will move "hi" from inside the <head> into the <body>. I assumed you intended to have a <title> in there.

Sign up to request clarification or add additional context in comments.

Comments

0

You can do that in JavaScript, because JavaScript is a sufficiently expressive language as to allow just about anything. However, it's not going to be particularly easy: you're going to have to implement (or find) as complete an HTML parser as is necessary to recognize the particular HTML documents that you want to convert. HTML itself is pretty complicated, and that complexity is greatly magnified by the fact that most of the world's stock of existing HTML documents are badly erroneous. Thus, if you've got well-constrained HTML that you know to be valid, or at least consistently invalid, that might make the task a little easier.

edit — @Hemlock points out, quite wisely, that if you're doing this in a browser (that is, if this code is going to run from inside a web page served to browsers), then you've got it a lot easier. You can hand your HTML over to the browser, perhaps as the content document for an <iframe> element you add to the page. If it's not too awful for the browser to parse (and browsers can cope with surprisingly weird HTML), then once the DOM is ready in the <iframe> you can just walk the DOM and generate whatever sort of different representation you want.

2 Comments

I'm assuming he's in the browser, so HTML parsing won't be needed. Then again, why the array if the DOM is present?
@Hemlock well now that's definitely a very good point, though handling complete HTML documents is a little clumsy; I guess he could dump the HTML into an <iframe> or something. I'll update my answer. I haven't had any coffee this morning, in my defense :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.