2
\$\begingroup\$

I've nearly got this working. I wanted to know if there is a much better way. One problem is that no matter what there will be cases where a URL is incorrectly identified and as such the end result will be dependent on requirements. My requirements are lackadaisy, I just want something that works most of the time. However some of the URLs in my fiddle are not being detected correctly and I'd like advice on the regex part of this for overcome these. The main issue here is that the overall design is in question, the should be a simpler method.

Root problem

Fiddle

function replaceURLWithHTMLLinks(text) {
    text = text.replace(/a/g, "--ucsps--");
    text = text.replace(/b/g, "--uspds--");
    var arrRegex = [
        /(\([^)]*\b)((?:https?|ftp|file):\/\/[-A-Za-z0-9+&@#\/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#\/%=~_()|])(.?\))/ig,
        /()(\b(?:https?|ftp|file):\/\/[-a-z0-9+&@#\/%?=~_()|!:,.;]*[-a-z0-9+&@#\/%=~_()|])(.?\b)/ig];
    for (i = 0; i < arrRegex.length; i++) {
        text = text.replace(arrRegex[i], "$1a$2b$3");
    }
    text = text.replace(/a([^b]*)b/g, "<a href='$1'>$1</a>");
    text = text.replace(/--ucsps--/g, "a");
    text = text.replace(/--uspds--/g, "b");
    return text;
}
var elm = document.getElementById('trythis');
elm.innerHTML = replaceURLWithHTMLLinks(elm.innerHTML);

Any thoughts?

\$\endgroup\$
5
  • \$\begingroup\$ what browser are you using? all of the links show up for me in chrome. and the code seems to be working. I am not very good at JavaScript, but there doesn't seem to be anything that really jumps out at me. \$\endgroup\$ Commented Oct 28, 2013 at 20:27
  • 1
    \$\begingroup\$ I think that this contains code that does what the OP wants it too, but the OP wants a Better way to Handle what it does. the OP doesn't need the Javascript to catch all of the URLS just Most of them, right?? \$\endgroup\$ Commented Oct 29, 2013 at 1:43
  • 1
    \$\begingroup\$ I'm using Chromium. Some of the URLs have extra 'junk' at the end. Specifically the last few have "." and ".)". \$\endgroup\$ Commented Oct 29, 2013 at 16:49
  • 1
    \$\begingroup\$ I'd like as many browsers to catch as many URLs as possible. This should be common code that everyone can use for any project, I'm amazed that there are so few good examples. \$\endgroup\$ Commented Oct 29, 2013 at 16:51
  • \$\begingroup\$ @Malachi “some of the URLs […] are not being detected correctly and I'd like advice […] to overcome these” That sounds to me like it doesn't work the way Mike wants. \$\endgroup\$ Commented Oct 30, 2013 at 1:22

1 Answer 1

2
\$\begingroup\$

I think that it's nutty to replace all characters 'a' and 'b' with junk, then un-replace them later, just so that you can use 'a' and 'b' as landmarks to indicate which points your two regular expressions have in common. If you're going to do that, then why not just use "--ucsps--" and "--uspds--" as your landmarks in the first place? Of course, using any special strings as landmarks is a bad idea.

The fundamental problem is that you want to exclude the same number of closing parentheses as there are open parentheses. You can't use a regular expression to match left and right parentheses, because such an expression would not be regular. Well, some flavours of "regular" expressions support backreferences within the expression itself, which could do the job. RegExps in JavaScript don't.

What JavaScript can do, though, is let you specify your own code to generate the replacement text. Here's what I came up with:

function replaceURLWithHTMLLinks(text) {
    var re = /(\(.*?)?\b((?:https?|ftp|file):\/\/[-a-z0-9+&@#\/%?=~_()|!:,.;]*[-a-z0-9+&@#\/%=~_()|])/ig;
    return text.replace(re, function(match, lParens, url) {
        var rParens = '';
        lParens = lParens || '';

        // Try to strip the same number of right parens from url
        // as there are left parens.  Here, lParenCounter must be
        // a RegExp object.  You cannot use a literal
        //     while (/\(/g.exec(lParens)) { ... }
        // because an object is needed to store the lastIndex state.
        var lParenCounter = /\(/g;
        while (lParenCounter.exec(lParens)) {
            var m;
            // We want m[1] to be greedy, unless a period precedes the
            // right parenthesis.  These tests cannot be simplified as
            //     /(.*)(\.?\).*)/.exec(url)
            // because if (.*) is greedy then \.? never gets a chance.
            if (m = /(.*)(\.\).*)/.exec(url) ||
                    /(.*)(\).*)/.exec(url)) {
                url = m[1];
                rParens = m[2] + rParens;
            }
        }
        return lParens + "<a href='" + url + "'>" + url + "</a>" + rParens;
    });
}
\$\endgroup\$
1
  • 1
    \$\begingroup\$ This works very well, thank you! I plan on extending this to support other brackets. \$\endgroup\$ Commented Oct 29, 2013 at 16:53

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.