Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I've nearly got this working. I wanted to know if there is a much better way. One problem is that no matter what there will be cases where a URL is incorrectly identified and as such the end result will be dependent on requirements. My requirements are lackadaisy, I just want something that works most of the time. However some of the URLs in my fiddle are not being detected correctly and I'd like advice on the regex part of this for overcome these. The main issue here is that the overall design is in question, the should be a simpler method.

Root problem

Fiddle

function replaceURLWithHTMLLinks(text) {
    text = text.replace(/a/g, "--ucsps--");
    text = text.replace(/b/g, "--uspds--");
    var arrRegex = [
        /(\([^)]*\b)((?:https?|ftp|file):\/\/[-A-Za-z0-9+&@#\/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#\/%=~_()|])(.?\))/ig,
        /()(\b(?:https?|ftp|file):\/\/[-a-z0-9+&@#\/%?=~_()|!:,.;]*[-a-z0-9+&@#\/%=~_()|])(.?\b)/ig];
    for (i = 0; i < arrRegex.length; i++) {
        text = text.replace(arrRegex[i], "$1a$2b$3");
    }
    text = text.replace(/a([^b]*)b/g, "<a href='$1'>$1</a>");
    text = text.replace(/--ucsps--/g, "a");
    text = text.replace(/--uspds--/g, "b");
    return text;
}
var elm = document.getElementById('trythis');
elm.innerHTML = replaceURLWithHTMLLinks(elm.innerHTML);

Any thoughts?

share|improve this question
 
what browser are you using? all of the links show up for me in chrome. and the code seems to be working. I am not very good at JavaScript, but there doesn't seem to be anything that really jumps out at me. –  Malachi Oct 28 '13 at 20:27
1  
I think that this contains code that does what the OP wants it too, but the OP wants a Better way to Handle what it does. the OP doesn't need the Javascript to catch all of the URLS just Most of them, right?? –  Malachi Oct 29 '13 at 1:43
1  
I'm using Chromium. Some of the URLs have extra 'junk' at the end. Specifically the last few have "." and ".)". –  Mike Mestnik Oct 29 '13 at 16:49
1  
I'd like as many browsers to catch as many URLs as possible. This should be common code that everyone can use for any project, I'm amazed that there are so few good examples. –  Mike Mestnik Oct 29 '13 at 16:51
 
@Malachi “some of the URLs […] are not being detected correctly and I'd like advice […] to overcome these” That sounds to me like it doesn't work the way Mike wants. –  svick Oct 30 '13 at 1:22
add comment

closed as off-topic by Stuart, Flambino, Dave Jarvis, Jamal, Mat's Mug Oct 29 '13 at 2:57

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions must contain working code for us to review it here. For questions regarding specific problems encountered while coding, try Stack Overflow. After your code is working you can edit this question for reviewing your working code." – Stuart, Flambino, Dave Jarvis, Jamal, Mat's Mug
If this question can be reworded to fit the rules in the help center, please edit the question.

1 Answer

up vote 1 down vote accepted

I think that it's nutty to replace all characters 'a' and 'b' with junk, then un-replace them later, just so that you can use 'a' and 'b' as landmarks to indicate which points your two regular expressions have in common. If you're going to do that, then why not just use "--ucsps--" and "--uspds--" as your landmarks in the first place? Of course, using any special strings as landmarks is a bad idea.

The fundamental problem is that you want to exclude the same number of closing parentheses as there are open parentheses. You can't use a regular expression to match left and right parentheses, because such an expression would not be regular. Well, some flavours of "regular" expressions support backreferences within the expression itself, which could do the job. RegExps in JavaScript don't.

What JavaScript can do, though, is let you specify your own code to generate the replacement text. Here's what I came up with:

function replaceURLWithHTMLLinks(text) {
    var re = /(\(.*?)?\b((?:https?|ftp|file):\/\/[-a-z0-9+&@#\/%?=~_()|!:,.;]*[-a-z0-9+&@#\/%=~_()|])/ig;
    return text.replace(re, function(match, lParens, url) {
        var rParens = '';
        lParens = lParens || '';

        // Try to strip the same number of right parens from url
        // as there are left parens.  Here, lParenCounter must be
        // a RegExp object.  You cannot use a literal
        //     while (/\(/g.exec(lParens)) { ... }
        // because an object is needed to store the lastIndex state.
        var lParenCounter = /\(/g;
        while (lParenCounter.exec(lParens)) {
            var m;
            // We want m[1] to be greedy, unless a period precedes the
            // right parenthesis.  These tests cannot be simplified as
            //     /(.*)(\.?\).*)/.exec(url)
            // because if (.*) is greedy then \.? never gets a chance.
            if (m = /(.*)(\.\).*)/.exec(url) ||
                    /(.*)(\).*)/.exec(url)) {
                url = m[1];
                rParens = m[2] + rParens;
            }
        }
        return lParens + "<a href='" + url + "'>" + url + "</a>" + rParens;
    });
}
share|improve this answer
1  
This works very well, thank you! I plan on extending this to support other brackets. –  Mike Mestnik Oct 29 '13 at 16:53
add comment

Not the answer you're looking for? Browse other questions tagged or ask your own question.