Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

It's hard to put everything in the question's title since it's specific depends on what string people want to split. But here it is:

I have a string in which there are multiple script tags:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>
<script type="text/javascript" src="/javascripts/something-3.js"></script>
<link rel="stylesheet" type="text/css" href="/something-1.css">

I want to split this string into multiple string, each contains a script tag (ignore link tags). This is how I did it:

var scripts = code.match(/<script.*src=.*(\/>|<\/script>)/g);

This is to match script tags with closing tag either /> or </script>. However, with this current regex, I always get:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>

as a string - not two.

How do I regex something like:

/<script.*src=( (not script not link) /> | (not link) <\/script> )/g
share|improve this question

4 Answers 4

Change .* to .*? to match as little as possible rather than as much as possible.

share|improve this answer

Another way to handle this is to use "src" as the attribute of the "script" tag by using the xmldom and take the page in as a partial or full xml doc. It's a positive way to grab them and avoid the links.

share|improve this answer

I would use something like

var rx = /<script.+?src=.+?\/(script)?>/gim;

This will match anything:

  • starting with <script
  • having at least 1 more character (can be a space, for example, or some other attribute-value pairs)
  • having src=
  • having least 1 more character
  • then either /> or /script>

and the flags…

  • the i flag is for case insensitivity
  • the g flag is for multiple matches
  • the m flag is for multiline sources (assuming these lines will actually be lines themselves and not a single line in total)

EDIT: I hadn't taken into account the possibility of having a get value like &src=etc in the address in the value of src attribute.

share|improve this answer
    
The m flag is irrelevant. It only matters if you're using anchors (^ and $) to match the beginning and end of lines. –  Alan Moore Apr 25 '12 at 1:57
    
Excuse my ignorance (this is a serious question) but what if the closing tag is in the following line, for example? Would omitting the m flag still let it be catched? –  inhan Apr 25 '12 at 2:02
    
In that case, your regex would fail because . doesn't match newlines, and multiline mode doesn't change that. All it does is change the behavior of ^ and $, allowing them to match at line boundaries as well as at the beginning and end of the whole string. The oft-repeated advice "if the source string is multiline, you must use multiline mode" is wrong. You may be thinking of single-line or DOTALL mode, which enables . to match any character, but JavaScript doesn't support that. –  Alan Moore Apr 25 '12 at 3:05
    
@Alan thanks for the explanation. –  inhan Apr 25 '12 at 3:11

Generally speaking, what you're trying to do is not possible. But if you can make certain simplifying assumptions about the source string, you can create a regex that's good enough. Here's what I would try:

/<script(?:\s+\w+\s*=\s*"[^"]*")+\s*/?>(?:</script>)?/gi

explanation:

  • <script matches the beginning of the start tag.

  • (?:\s+\w+\s*=\s*"[^"]*")+ consumes one or more attributes

  • \s*/?> matches the end of the start tag. If it's a self-closing tag, the /? consumes the slash.

  • \s*(?:</script>)? otherwise, this matches the end tag.

The basic idea is to replace the .* with something that can't match the > at the end of the start tag and thus "escape" to match more than you want. Of course, there are no guarantees. I don't even know if your HTML is valid, and there are many ways this regex can be fooled even in valid HTML.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.