I want to use a white list of tags, attributes and values to sanitize a html string, before I place it in the dom. Can safely I construct a dom element, and traverse over that to implement the white list filter, assuming that no malicious javascript could execute until I append the dom element to the document? Are there pitfalls to this approach?
|
No script embedded in the HTML can execute until it is put in the document. Try running this code on any page:
You will notice nothing change. If the "malicious" script in the HTML was run, then the document should have vanished. So, you can use the DOM to sanitize HTML without worrying about bad JS being in the HTML. As long as you snip out the script in your sanitizer of course. By the way, your approach is pretty safe and smarter than what most people try (parse it with regex, the poor fools). However, it's best to rely on good, trusted HTML sanitizing libraries for this, like HTML Purifier. Or, if you want to do it client-side, you can use ESAPI-JS (recommended by @Brett Zamir) |
|||||
|
It doesn't appear that anything will execute until you insert into the document, as per @rvighne's answer, but there are at least these (unusual) exceptions (tested in FF 27.0):
...or...
...or... (though setUserData is deprecated, it is still working):
...or duration iteration...
But without these kind of (unusual) event interactions, the fact of building into the DOM alone would not, as far as I have been able to detect, cause any side effects (and of course the examples above are contrived and one wouldn't expect to encounter them very often if at all!). |
|||||||||||||
|