Skip to content


Google Releasing New Open Source Regex Library

As this post on the Google open source blog, regular expressions are pretty fundamental to much advanced hackery. Most implementations, however, share a fairly horrific worst case in terms of time and space complexity. That is to say, under certain circumstances, evaluating a regular expression over a large enough amount of input could hang forever, gobble up all memory allocated to its thread or threads, or both.

Google has been gobbling up the brightest computer scientists and software engineers at a prodigious pace. RE2, their new regular expressions library, is the latest dividend of their carefully hoarded brain trust. The search giant claims their improved implementation completes in linear time and won’t overflow the stack space of its threads. Put simply, they’ve fixed the potential worst case for this critical Swiss Army knife software tool.

Like Go, this is another stab at speeding up the fundamentals most of us building software take for granted. Beyond the worst case, I would imagine such a startlingly different approach may also speed up the more typical cases.

The release is available under a very permissive BSD license. It won’t take long for the wider community to vet their claims. If RE2 lives up to its promise, I imagine it won’t take long to spread far and wide.

Update: See Philip’s comment below.  The predictable performance of RE2 comes at a considerable price: it doesn’t support back references.  Back references let you re-use part of an expression, the value a group actually matched, later in the expression.  They are intensely useful and based on my own experience, I would hesitate quite a bit more at the trade off Russ Cox suggests is so simple.

Related posts

Posted in Programming.

Tagged with .


4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Philip Durbin says

    “RE2 drops support for backreferences and generalized zero-width assertions, because they cannot be implemented efficiently” according to http://code.google.com/p/re2 and backreferences are all grayed out as not supported by RE2 on http://code.google.com/p/re2/wiki/Syntax . Will RE2 spread far and wide? Well. . . do you want backreferences?

    In his paper Regular Expression Matching Can Be Simple And Fast, Russ Cox, the Google engineer who also authored the press release, says, “Some might argue that this test is unfair to the backtracking implementations, since it focuses on an uncommon corner case. This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy.”

    I guess so. . . it really comes down to features versus performance.

    • Thomas Gideon says

      Thanks for the catch, Philip. I’ve updated the post to reflect this caveat. I should have realized there was more to the story and dug into it further.

  2. Claes Wallin says

    A regular expression library that only supports actual regular expressions?The days of wonder have not come to an end!

    • Thomas Gideon says

      Had I not been so tired when writing up this post, the breathless tone would have raised a red flag. Still, if you have some application where you don’t need back references but need greater speed/scalability, I suppose it is nice to have this option.
      As it stands, I retract the implication that this is a suitable replacement for even a majority of the uses to which existing implementations have been put.



Some HTML is OK

or, reply to this post via trackback.



Creative Commons License
The Command Line by Thomas Gideon
is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.