What's the proper way to encode untrusted data for HTML attribute context? For example:
<input type="hidden" value="<?php echo $data; ?>" />
I usually use htmlentities()
or htmlspecialchars()
to do this:
<input type="hidden" value="<?php echo htmlentities($data); ?>" />
However, I recently ran into an issue where this was breaking my application when the data I needed to pass was a URL which needed to be handed off to JavaScript to change the page location:
<input id="foo" type="hidden" value="foo?bar=1&baz=2" />
<script>
// ...
window.location = document.getElementById('foo').value;
// ...
</script>
In this case, foo
is a C program, and it doesn't understand the encoded characters in the URL and segfaults.
I can simply grab the value in JavaScript and do something like value.replace('&', '&')
, but that seems kludgy, and only works for ampersands.
So, my question is: is there a better way to go about the encoding or decoding of data that gets injected into HTML attributes?
I have read all of OWASP's XSS Prevention Cheatsheet, and it sounds to me like as long as I'm careful to quote my attributes, then the only character I need to encode is the quote itself ("
) - in which case, I could use something like str_replace('"', '"', ...)
- but, I'm not sure if I'm understanding it properly.
urlencode()
is for encoding URL parameters, not whole URLs, and does not encode for the HTML attribute context. There is a section in the manual that even talks about this - "Leave it as &, but simply encode your URLs using htmlentities() or htmlspecialchars()."window.location = document.getElementById('foo');
? that should be like this I think->window.location = document.getElementById('foo').value;
and it redirects to right page(foo?bar=1&baz=2)foo?bar=1&baz=2
. PHP is able to understand this, butfoo
is not a PHP script, and just crashes unless the URL is likefoo?bar=1&baz=2
.value
of the input in your case isfoo?bar=1&baz=2
, as demonstrated here. Your script as posted won't result in a redirect tofoo?bar=1&baz=2
but tofoo?bar=1&baz=2
.