PHP Regex for url

Question

I worked out this regex and its working close but only one more problem in that, it matches any word with more than one period (.)

For example: stuf... (got matched)

How do I limit the period to "Only allow 1 period per set of bracket" in the regex?

'#((\w+://)?(\w+\.)([a-z0-9\-/.?=_&%])+)#i'

Its a preg_match to replace links in text. So perhaps filter couldn't help? I did try {1} but if I put in, I will get error: '#((\w+://)?(\w+\.)([a-z0-9\-/.{1}?=_&%])+)#i' — pakito
– pakito, Commented Jun 28, 2011 at 13:29
Pakito, to which RFC are you referring when asking the question about URLs? Would be good to know the protocol as well. Is this specifically for the http and https protocols? — hakre
– hakre, Commented Jun 28, 2011 at 13:50
Both actually. Something that is able to match most common urls as much as possible. Even ftp,hence the \w at the front. — pakito
– pakito, Commented Jun 28, 2011 at 13:52

Floern · Accepted Answer · 2011-06-28 13:53:50Z

0

Try this:

'#((\w+://)?(\w+)(\.[a-z0-9\-/?=_&%]+)+)#i'

This requires a non-period-char after each period.

But I would recommend something like this:

'#((\w+://)?\w+(\.[a-z0-9\-]+)*\.[a-z\-]{2,}(/[\w\-./?=&%]*)?)#i'

edited Jun 28, 2011 at 13:53

answered Jun 28, 2011 at 13:28

Floern

33.9k24 gold badges106 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Igor Korkhov Over a year ago

Unfortunately this accepst URIs like http://.-sample.-com which are invalid (a dash cannot be the first character).

pakito Over a year ago

lol i tested on facebook with http://.-sample.-com and it accepted as a valid url as well.

Floern Over a year ago

@pakito: do you want to match an URL in a text or just check it's correctness?

pakito Over a year ago

Yes I want to match url(s) in text and create as hyperlink. Sorry if I sound unclear. Floern: I am currently using your recommended regex and seems like working good for most urls. The only thing need to add in to the end is underscore and period. #((\w+://)?(\w+)(\.[a-z0-9\-]+)*\.[a-z\-]{2,}(/[\w\-/?=_.&%]*)?)#i

Floern Over a year ago

my bad, I added the period, but the underscore is already contained in \w.

|

jefflunt · Accepted Answer · 2011-06-28 13:28:08Z

0

This should work:

[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}

The issue in the regex you're using is that you're using the greedy version "+" in the character class that includes your period. The regex I posted here checks for only a single perdiod in the name.

This pattern will successfully match google.com, www.google.com, and any arbitrary number of subdomains.

NOTE: ICANN recently announced that soon they will allow for any top-level domain (e.g. instead of just .com, .org, etc. they will soon allow .whatever), so you may need to adjust the last part of the regex, "{2,4}", since TLDs will soon be of arbitrary length.

answered Jun 28, 2011 at 13:28

jefflunt

34k7 gold badges91 silver badges127 bronze badges

2 Comments

pakito Over a year ago

Yes thanks normalo, that is (tld) something I am trying to avoid as well.

jefflunt Over a year ago

Ah, didn't know .museum was already in use. Well, that makes it a bit more challenging, because now you're essentially looking for any.combination.of.valid.characters.delimeted.by.periods.without.spaces.and.not.ending.in.a.punctuation.mark :S The challenge comes from the increase of possible false positive matches. I guess you'll just have to try it and see what happens.

Abhay · Accepted Answer · 2011-06-28 15:29:32Z

0

Well, if you want to validate URLs, why not use parse_url()? I think it's tricky to create a general regex for so many varied URL forms

answered Jun 28, 2011 at 15:29

Abhay

6,6752 gold badges26 silver badges17 bronze badges

Collectives™ on Stack Overflow

PHP Regex for url

3 Answers 3

6 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related