Escaping HTML entities in JavaScript string literals within the <script> block

Question

On the one hand if I have

<script>
var s = 'Hello </script>';
console.log(s);
</script>

the browser will terminate the <script> block early and basically I get the page screwed up.

On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output

Hello &lt;/script&gt;

which is not desired behavior in this case.

One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.

var s = 'Hello <\/script>';

This seems to be working fine.

Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.

<div onClick="alert('Hello ">')"></div>

looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?

possible duplicate of JavaScript and error "end tag for element which is not open" — Quentin, Jan 5 '12 at 21:47

ThinkingStiff · Accepted Answer · 2012-01-05 23:51:01Z

The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \, and &.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

var s = 'Hello <\/script>';

For inline Javascript in HTML, you can use entities:

<div onClick="alert('Hello &quot;>')">click me</div>

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script> blocks and inline Javascript is \uxxxx, where xxxx is the hexadecimal character code.

< - \u003c
> - \u003e
" - \u0022
' - \u0027
\ - \u005c
& - \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>

<script>
    var s = 'Hello \u003c/script\u003e';
alert( s );
</script>

The hex escape method is the best so far: you don't have to worry where your string ends up in the code, just send everything through one basic server-side function. Great, I like it! — mojuba, Jan 5 '12 at 23:47

Jamie Treworgy · Answer 2 · 2012-01-05 20:27:03Z

up vote 2 down vote

(edit - somehow didn't notice you mentioned slash-escape in your question already...)

OK so you know how to escape a slash.

In inline event handlers, you can't use the bounding character inside a literal, so use the other one:

<div onClick='alert("Hello \"")'>test</div>

But this is all in aid of making your life difficult. Just don't use inline event handlers! Or if you absolutely must, then have them call a function defined elsewhere.

Generally speaking, there are few reasons for your server-side code to be writing javascript. Don't generate scripts from the server - pass data to pre-written scripts instead.

(original)

You can escape anything in a JS string literal with a backslash (that is not otherwise a special escape character):

var s = 'Hello <\/script>';

This also has the positive effect of causing it to not be interpreted as html. So you could do a blanket replace of "/" with "\/" to no ill effect.

Generally, though, I am concerned that you would have user-submitted data embedded as a string literal in javascript. Are you generating javascript code on the server? Why not just pass data as JSON or an HTML "data" attribute or something instead?

edited Jan 5 '12 at 20:27

answered Jan 5 '12 at 20:13

Jamie Treworgy
14.3k24289

Re: passing strings to JS: it's a valid point to use, say, JSON instead, but I'm trying to save some traffic and connections by inserting data directly into HTML/JS. For small amounts of data I think it's OK. – mojuba Jan 5 '12 at 20:48

This technique can only cost you in terms of bandwidth, since such scripts cannot be cached by the browser. Quick and dirty, stick it in a hidden element: <span style="display:none;" id="mule" data-text="... attributed encoded text or JSON structure"></span> There's no rule against doing it however you want, but it sure saves a lot of headaches and makes for easier, more secure, more maintainable code to avoid generating scripts. – Jamie Treworgy Jan 5 '12 at 20:52

Re: your solution with reverting the bounding characters will require my server-side code to look for quotes within my JS snippet and decide whether it should be enclosed in single or double quotes. Getting too complicated. Far easier to just escape everything like any HTML literal text. – mojuba Jan 5 '12 at 20:53

Except it won't work reliably because javascript isn't a literal. You need to combine the rules for escaping within javascript literals, and the rules for escaping within an HTML element, which is pretty darn complicated all of the sudden. A double-quote inside single-quotes becomes " but what about a double-quote that's bounding a string literal? Answer is simple: avoid inline scripts. Pass data instead. – Jamie Treworgy Jan 5 '12 at 21:01

To be honest, I already fixed my code and it works. Rule #1: when generating a JS string literal on the server, escape quotes, newlines and slash with backslash. Rule #2: when inserting anything into HTML other than JS code in the script block, escape as usual with htmlentities(). – mojuba Jan 5 '12 at 21:08

add a comment |

hugomg · Answer 3 · 2012-01-05 20:21:09Z

up vote 1 down vote

I'd say the best practice would be avoiding inline JS in the first place.

Put the JS code in a separate file and include it with the src attribute

<script src="path/to/file.js"></script>

and use it to set event handlers from the inside isntead of putting those in the HTML.

//jquery example
$('div.something').on('click', function(){
    alert('Hello>');
})

answered Jan 5 '12 at 20:21

hugomg
33.5k755121

And what if I have my reasons for using inline code? For efficiency, saving traffic, connections, etc. on a highly loaded web site. – mojuba Jan 5 '12 at 20:49

1

@mojuba: Well, by the time you get to this kind of performance tuning most best practices have already been thrown out the window :) – hugomg Jan 5 '12 at 21:18

add a comment |

Diodeus · Answer 4 · 2012-01-05 20:08:02Z

up vote -1 down vote

Most people use this trick:

var s = 'Hello </scr' + 'ipt>';

answered Jan 5 '12 at 20:08

Diodeus
72.4k1991129

2

So if the code is generated on the server side, I need to look for <script> and replace it with the broken one? Isn't it easier to just escape the slashes? – mojuba Jan 5 '12 at 20:47

add a comment |

asked	3 years ago
viewed	13385 times
active	3 years ago

current community

your communities

more stack exchange communities

Escaping HTML entities in JavaScript string literals within the <script> block

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged javascript html escaping or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Escaping HTML entities in JavaScript string literals within the <script> block

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged javascript html escaping or ask your own question.

Linked

Related

Hot Network Questions