Which Regular Expression Algorithm does Javascript use for Regex?

Question

I was reading this article today on two different regular expression algorithms.

According to the article old Unix tools like ed, sed, grep, egrep, awk, and lex, all use what's called the Thompson NFA algorithm in their regular expresssions...

However newer tools like Java, Perl, PHP, and Python all use a different algorithm for their regular expressions that are much, much slower.

This article makes no mention at all of Javascript's regex algorthim, (and yes I know there are various JS engines out there) but I was wondering if anybody knew which of those algorithms they use, and if maybe those algorithms should be swapped out for Thompson NFA.

Charlie Martin · Accepted Answer · 2009-04-07 22:58:22Z

up vote 7 down vote accepted

The Javascript ECMA language description doesn't impose a requirement for the particular implementation of regular expressions, so that part of the question isn't well-formed. You're really wondering about the particular implementation in a particular browser.

The reason Perl/Python etc use a slower algorithm, though, is that the regex language defined isn't really regular expressions. A real regular expression can be expressed as a finite state machine, but the language of regex is context free. That's why the fashion is to just call it "regex" instead of talking about regular expressions.

Update

Yes, in fact javascript regex isn't ~~content free~~ regular. Consider the syntax using `{n,m}', that is, matches from n to m accepted regexs. Let d the difference d=|n-m|. The syntax means there exists a string ux^dw that is acceptable, but a string ux^k>dw that is not. It follows via the pumping lemma for regular languages that this is not a regular language.

(augh. Thinko corrected.)

edited Apr 7 '09 at 22:58

answered Apr 7 '09 at 21:05

Charlie Martin
63.7k10107189

1

Actually, they're more than full regular expressions. "{n,m}", for example, can't be represented in an FSA for arbitrary n,m. – Charlie Martin Apr 7 '09 at 21:51

1

@Charlie Well done Charlie! Wow. – leeand00 Apr 8 '09 at 3:39

1

Your "update" about {n,m} is wrong. x{3,5} can be written as xxx|xxxx|xxxxx which is perfectly regular and handled perfectly well with a DFA engine. – Jan Goyvaerts Apr 8 '09 at 9:41

3

The unbounded x{3,} can be rewritten as xxxx* which is regular and can be implemented with a DFA with 4 states. Try it at osteele.com/tools/reanimator – Jan Goyvaerts Apr 10 '09 at 6:38

1

I think that the demonstration is bogus. Since n and m are fixed when the regex is defined, you always can find an integer p > 0 (that only depends on the regex) such that every string w of length at least p and that matches the regex, can be written as w = xyz, |y| > 0, |xy| ≤ p, for all i ≥ 0, xy{i}z matches the regex. In my view, it's the x(?=y) syntax that is not regular. – boumbh Feb 5 at 9:24

| show 24 more comments

Peter Perháč · Answer 2 · 2009-04-07 21:04:01Z

up vote 8 down vote

Looking at Wikipedia I found out it's "Traditional NFA". A simple question deserves simple answer :)

answered Apr 7 '09 at 21:04

Peter Perháč
8,419966109

Hey that's neat. thanks! – leeand00 Apr 7 '09 at 21:08

That doesn't make it true. I don't think the javascript language is really finite state. – Charlie Martin Apr 7 '09 at 21:09

Digging for truth or searching for answers? What is this site all about? – Peter Perháč Apr 7 '09 at 21:11

@Charlie @MasterPeter Hmm...maybe that's why it isn't on the list of programs on Wikipedia...(and also why the list lacks browsers...) – leeand00 Apr 7 '09 at 21:17

In the mean time, had a look at the syntax. It's definitely not finite state. – Charlie Martin Apr 7 '09 at 21:53

| show 2 more comments

Chas. Owens · Answer 3 · 2009-04-07 21:04:21Z

up vote 3 down vote

Perl uses a memoized recursive backtracking search and, as of some improvements in 5.10, no longer blows up on perl -e '("a" x 100000) =~ /^(ab?)*$/;'. In recent tests I performed on an OS X box, Perl 5.10 outperformed awk, even in the cases where awk's algorithm was supposed to be better.

answered Apr 7 '09 at 21:04

Chas. Owens
41.5k779168

So that article is most likely outta date then... – leeand00 Apr 7 '09 at 21:08

add a comment |

Jan Goyvaerts · Answer 4 · 2009-04-10 06:45:28Z

up vote 1 down vote

Though the ECMA standard does not specify the algorithm an ECMAScript implementation should use, the fact that the standard mandates that ECMAScript regular expressions must support backreferences (\1, \2, etc.) rules out the DFA and "Thompson NFA" implementations.

edited Apr 10 '09 at 6:45

answered Apr 8 '09 at 9:38

Jan Goyvaerts
10.7k33148

Backreferences rule out DFA (deterministic finite automaton), but there are other ways to solve the problem (e.g. recursive backtracking). Perl uses memoized backtracking recursion which removes a lot of the downsides to recursive backtracking (still eats a lot of memory on certain patterns though). – Chas. Owens Apr 8 '09 at 11:56

add a comment |

asked	5 years ago
viewed	1675 times
active	5 years ago

current community

your communities

more stack exchange communities

Which Regular Expression Algorithm does Javascript use for Regex?

4 Answers 4

Update

Your Answer

Not the answer you're looking for? Browse other questions tagged javascript regex algorithm language-agnostic unix or ask your own question.

Visit Chat

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Which Regular Expression Algorithm does Javascript use for Regex?

4 Answers 4

Update

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged javascript regex algorithm language-agnostic unix or ask your own question.

Visit Chat

Linked

Related

Hot Network Questions