0

It's hard to put everything in the question's title since it's specific depends on what string people want to split. But here it is:

I have a string in which there are multiple script tags:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>
<script type="text/javascript" src="/javascripts/something-3.js"></script>
<link rel="stylesheet" type="text/css" href="/something-1.css">

I want to split this string into multiple string, each contains a script tag (ignore link tags). This is how I did it:

var scripts = code.match(/<script.*src=.*(\/>|<\/script>)/g);

This is to match script tags with closing tag either /> or </script>. However, with this current regex, I always get:

<script type="text/javascript" src="/javascripts/something-1.js"></script>
<script type="text/javascript" src="/javascripts/something-2.js"/>

as a string - not two.

How do I regex something like:

/<script.*src=( (not script not link) /> | (not link) <\/script> )/g

4 Answers 4

2

Change .* to .*? to match as little as possible rather than as much as possible.

Sign up to request clarification or add additional context in comments.

Comments

2

Another way to handle this is to use "src" as the attribute of the "script" tag by using the xmldom and take the page in as a partial or full xml doc. It's a positive way to grab them and avoid the links.

Comments

2

Generally speaking, what you're trying to do is not possible. But if you can make certain simplifying assumptions about the source string, you can create a regex that's good enough. Here's what I would try:

/<script(?:\s+\w+\s*=\s*"[^"]*")+\s*/?>(?:</script>)?/gi

explanation:

  • <script matches the beginning of the start tag.

  • (?:\s+\w+\s*=\s*"[^"]*")+ consumes one or more attributes

  • \s*/?> matches the end of the start tag. If it's a self-closing tag, the /? consumes the slash.

  • \s*(?:</script>)? otherwise, this matches the end tag.

The basic idea is to replace the .* with something that can't match the > at the end of the start tag and thus "escape" to match more than you want. Of course, there are no guarantees. I don't even know if your HTML is valid, and there are many ways this regex can be fooled even in valid HTML.

Comments

1

I would use something like

var rx = /<script.+?src=.+?\/(script)?>/gim;

This will match anything:

  • starting with <script
  • having at least 1 more character (can be a space, for example, or some other attribute-value pairs)
  • having src=
  • having least 1 more character
  • then either /> or /script>

and the flags…

  • the i flag is for case insensitivity
  • the g flag is for multiple matches
  • the m flag is for multiline sources (assuming these lines will actually be lines themselves and not a single line in total)

EDIT: I hadn't taken into account the possibility of having a get value like &src=etc in the address in the value of src attribute.

3 Comments

The m flag is irrelevant. It only matters if you're using anchors (^ and $) to match the beginning and end of lines.
Excuse my ignorance (this is a serious question) but what if the closing tag is in the following line, for example? Would omitting the m flag still let it be catched?
In that case, your regex would fail because . doesn't match newlines, and multiline mode doesn't change that. All it does is change the behavior of ^ and $, allowing them to match at line boundaries as well as at the beginning and end of the whole string. The oft-repeated advice "if the source string is multiline, you must use multiline mode" is wrong. You may be thinking of single-line or DOTALL mode, which enables . to match any character, but JavaScript doesn't support that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.