As this post on the Google open source blog, regular expressions are pretty fundamental to much advanced hackery. Most implementations, however, share a fairly horrific worst case in terms of time and space complexity. That is to say, under certain circumstances, evaluating a regular expression over a large enough amount of input could hang forever, gobble up all memory allocated to its thread or threads, or both.
Google has been gobbling up the brightest computer scientists and software engineers at a prodigious pace. RE2, their new regular expressions library, is the latest dividend of their carefully hoarded brain trust. The search giant claims their improved implementation completes in linear time and won’t overflow the stack space of its threads. Put simply, they’ve fixed the potential worst case for this critical Swiss Army knife software tool.
Like Go, this is another stab at speeding up the fundamentals most of us building software take for granted. Beyond the worst case, I would imagine such a startlingly different approach may also speed up the more typical cases.
The release is available under a very permissive BSD license. It won’t take long for the wider community to vet their claims. If RE2 lives up to its promise, I imagine it won’t take long to spread far and wide.
Update: See Philip’s comment below. The predictable performance of RE2 comes at a considerable price: it doesn’t support back references. Back references let you re-use part of an expression, the value a group actually matched, later in the expression. They are intensely useful and based on my own experience, I would hesitate quite a bit more at the trade off Russ Cox suggests is so simple.