Affordable DIY Book Scanner

One of the barriers to wide spread file sharing of books is the slow and labored adoption of ebooks by the industry and the cost, both in terms of labor and material, for digitizing print editions. As publishers experiment with committing the same mistakes that the music industry made before them with disproportionately high pricing of digital editions and crippling DRM, do-it-yourselfers are expressing their frustration by just routing around the problem.

Wired details one such case, a student who has cobbled together a workable book scanner for about $300. It still involves some manual effort, it doesn’t include any means to automatically turn pages. But it overcomes the limitations of the other commodity option, using consumer grade flatbed scanners. According to the article, there already seems to be a growing community around Reetz’s design so solutions for automating page turning may yet be suggested along with countless other improvements.

Appropriately enough, the Wired piece also has a bit of legal analysis. UCB professor Pamela Samuelson contends that the project should fall largely under personal fair use. She also considers how this may affect competitive electronic book offerings from publishers, hopefully for the better.

2 Replies to “Affordable DIY Book Scanner”

  1. Nice idea, and I’m sure someone with a couple of cheap webcams can make that scanner. But, as you mention, and is pointed out in the video, it takes about 20 minutes to scan an average boot this way — assuming that you don’t make any mistakes, and the book is in good shape.

    HOWEVER, there is still a major stumbling block that isn’t talked about in the article: OCR. You can make images of the pages all you want, but until the content has been OCR’ed, all you have is a bunch of pictures of pages. Not very usable. A few years ago I tried cobbling together a system to take a bunch of PDF’s that consisted of a set of embedded TIFF’s, pull them out, OCR them, then put everything into a new document. What I found was (a) the OCR software itself was sub-optimal, requiring manual intervention continually (for images that were slightly askew, irregularities of the documents, etc., (b) the results unreliable and required heavy proofing, (c) this didn’t address front or back matter (table of contents, glossaries, references, or indexes), (d) images still needed to be manually clipped and inserted into the documents.

    So, I welcome this scanner idea, but until we’ve really handled all the other parts of the process that need to be addressed, it’s only the tip of the iceberg.

    I also find myself questioning the scanner a little bit. What’s wrong with the current flatbed scanners that have a page feeder? To scan a book you saw the binding off the book, and feed the pages through the scanner. The only time I see Reetz’s device being really useful is when the preservation of the physical object is required because of it’s value (historically, socially, etc.)

    Anyway, just a few thoughts.

  2. I don’t necessarily disagree with your points about OCR and flatbed scanners.

    But Reetz’s project has gotten people thinking and talking more broadly about book scanning, realizing that however you go about it, it is feasible to undertake as a consumer.

    I see the real value as catalyzing further participating in distributed efforts like Project Gutenberg or putting into the hands of dedicated individuals the ability to digitize a few, out of print but beloved works where preservation is an issue and there is the motivation to overcome the quirks. If enough people are enabled and/or motivated, that is what will start the accelerating improvements we’ve seen in other areas of peer production. That’s what really gets me interested.

Leave a Reply

Your email address will not be published. Required fields are marked *