Malartu

View Original

reCaptcha

Emailed on May 29th 2020 in The Friday Forward

NPR hosts a wonderful podcast called "How I Built This with Guy Raz" that publishes interviews with successful founders. One of the more recent interviews was between host Guy Raz and guest Luis von Ahn.  You can listen here, but as always, here's the scoop:

In 2000, Luis von Ahn was starting his PhD in computer science when he attended a talk and happened to learn about one of Yahoo's biggest problems: automated bots were signing up for millions of free Yahoo email accounts, and generating tons of spam.

Von Ahn came up with a solution he would call CAPTCHA, or Completely Automated Public Turing test to tell Computers and Humans Apart. After building CAPTCHA, he effectively gave it to Yahoo. It proved wildly successful and was quickly adopted as their standard for verifying humans.

Now Von Ahn wanted to find a way to harness the collective effort of so many people filling out CAPTCHAS, so he created reCAPTCHA in 2007.

This is where things get interesting.

reCAPTCHA was a CAPTCHA software program, just like the others, only reCAPTCHA was free for websites to integrate, resulting in a proliferation of adoption, making reCAPTCHA the internet’s standard CAPTCHA program. The genius behind von Ahn’s business model was that every time a user verified themselves, they were actually creating indexable, digitized text of hundreds of years of books, magazines, journals and newspapers.

Each time you fill out a reCAPTCHA, you're not just verifying you're human, you're translating images that can't be read by Optical Character Recognition (OCR) software. OCR software is the main automation behind turning physical texts into digital ones. 

That's something folks pay money for, and that's exactly what the New York Times did.

Von Ahn first partnered with the New York Times to digitize all of its back issues. Within a few months, reCAPTCHA had digitized the previous 20 years of New York Times issues. Within the first year, 440 million words were deciphered; the equivalent of 17,600 books.

In the interview, Von Ahn discloses that at this point the crowdsourced program was generating nearly $40k in revenue every few days. It was still only managed by Von Ahn and an undergrad student of his. Von Ahn was still teaching full time and the student still taking a full-time course load.

In 2009, reCAPTCHA was purchased by Google for an undisclosed amount. Google used it to build its Google Books library, which is now one of the largest digital libraries in the world, thanks to reCAPTCHA. Google has since used reCAPTCHA for other purposes, such as having users identify street names and addresses from Google Maps Street View.

reCAPTCHA has proven to be in-keeping the adage so often associated with Google; if you are not paying for the product, you are the product.


Subscribe to Get More Snippets Like This Straight To Your Inbox Every Friday

See this content in the original post