I have talked and written about CAPTCHAs repeatedly, initially in admiration of the elegance of the idea originally developed at CMU by Luis von Ahn. As time has marched on, though, many CAPTCHA implementation have fallen to the ever increasing power and sophistication of attackers. The cleverest of them have farmed out breaking of CAPTCHAs to actual people, bypassing what makes them ordinarily so effective, that is using computationally difficult puzzles that are relatively easy for humans to solve.
Personally, I still think the core idea has permutations and a certain neatness and simplicity yet to be exhausted. Ahn would seem to agree, building on the defensive aspects of the CAPTCHA to come up with reCAPTCHA, a project that uses optical character recognition failures both as a puzzle to prove a user is human, not a bot, and to serve a public good. As reCAPTCHA’s challenges are solved and vetted, the results feed back into the OCR projects from which they originated, improving digitization of texts more cheaply and effectively than using other, more individually labor intensive techniques.
Google also has invested considerably in CAPTCHA implementations, working feverishly to stay ahead of attackers. With their beleaguered Books project which at its core is a large scale effort to digitize texts, it is hardly surprising to see news this past week that Google has acquired Ahn’s spin off, commercial effort around the original academic reCAPTCHA project. According to the NYT’s, Ahn has collaborated with Google before, for a similar crowd sourced effort to supplement machine categorization of information, specifically the tagging of photos.
According to Ars Technica, reCAPTCHA hadn’t previously contributed to Google’s Books project but the acquisition makes sense both for that project and to help to continue to evolve the defenses Google uses for its many services. Ahn will become a Google employee no doubt working on both collective user efforts and creative security initiatives, hopefully some or all of his staff from reCAPTCHA will be joining him.
Lauren Weinstein does urge some caution around this otherwise optimistic union. He details his evaluation of reCAPTCHA for use with a forum he was setting up recently. His post has a good explanation of the data possibly being logged by reCAPTCHA as participating sites and users make use of it. The potential privacy risks here are pretty clear and he unfortunately had some difficulty in discovering the project’s policies around how they treat this data.
He thinks the acquisition is an opportunity, a critical one, for Google to remedy this situation. I think there is good evidence to believe that they will. This is an issue worth keeping an eye on so the new efforts of the reCAPTCHA folks at Google isn’t hobbled by arguments over the unintended consequences of moving their work to the search giant where the risks of data retention and correlation are even greater.