It's hard to put everything in the question's title since it's specific depends on what string people want to split. But here it is:

I have a string in which there are multiple script tags:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>
<script type="text/javascript" src="/javascripts/something-3.js"></script>
<link rel="stylesheet" type="text/css" href="/something-1.css">

I want to split this string into multiple string, each contains a script tag (ignore link tags). This is how I did it:

var scripts = code.match(/<script.*src=.*(\/>|<\/script>)/g);

This is to match script tags with closing tag either /> or </script>. However, with this current regex, I always get:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>

as a string - not two.

How do I regex something like:

/<script.*src=( (not script not link) /> | (not link) <\/script> )/g
link|improve this question

65% accept rate
feedback

4 Answers

Change .* to .*? to match as little as possible rather than as much as possible.

link|improve this answer
feedback

Another way to handle this is to use "src" as the attribute of the "script" tag by using the xmldom and take the page in as a partial or full xml doc. It's a positive way to grab them and avoid the links.

link|improve this answer
feedback

I would use something like

var rx = /<script.+?src=.+?\/(script)?>/gim;

This will match anything:

  • starting with <script
  • having at least 1 more character (can be a space, for example, or some other attribute-value pairs)
  • having src=
  • having least 1 more character
  • then either /> or /script>

and the flags…

  • the i flag is for case insensitivity
  • the g flag is for multiple matches
  • the m flag is for multiline sources (assuming these lines will actually be lines themselves and not a single line in total)

EDIT: I hadn't taken into account the possibility of having a get value like &src=etc in the address in the value of src attribute.

link|improve this answer
The m flag is irrelevant. It only matters if you're using anchors (^ and $) to match the beginning and end of lines. – Alan Moore Apr 25 at 1:57
Excuse my ignorance (this is a serious question) but what if the closing tag is in the following line, for example? Would omitting the m flag still let it be catched? – inhan Apr 25 at 2:02
In that case, your regex would fail because . doesn't match newlines, and multiline mode doesn't change that. All it does is change the behavior of ^ and $, allowing them to match at line boundaries as well as at the beginning and end of the whole string. The oft-repeated advice "if the source string is multiline, you must use multiline mode" is wrong. You may be thinking of single-line or DOTALL mode, which enables . to match any character, but JavaScript doesn't support that. – Alan Moore Apr 25 at 3:05
@Alan thanks for the explanation. – inhan Apr 25 at 3:11
feedback

Generally speaking, what you're trying to do is not possible. But if you can make certain simplifying assumptions about the source string, you can create a regex that's good enough. Here's what I would try:

/<script(?:\s+\w+\s*=\s*"[^"]*")+\s*/?>(?:</script>)?/gi

explanation:

  • <script matches the beginning of the start tag.

  • (?:\s+\w+\s*=\s*"[^"]*")+ consumes one or more attributes

  • \s*/?> matches the end of the start tag. If it's a self-closing tag, the /? consumes the slash.

  • \s*(?:</script>)? otherwise, this matches the end tag.

The basic idea is to replace the .* with something that can't match the > at the end of the start tag and thus "escape" to match more than you want. Of course, there are no guarantees. I don't even know if your HTML is valid, and there are many ways this regex can be fooled even in valid HTML.

link|improve this answer
feedback

Your Answer

 
or
required, but never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.