Skip to content

Crowdsourcing typo identification in e-books

November 29, 2011

I personally think that crowdsourcing is overblown; it’s the latest trick pony, and I’m not convinced it’ll solve nearly as many problems as the research (and technical) community occasionally seems to think it will (particularly when the economy picks up again and we don’t have as many un- or under-employed people). But there are so cases where crowdsourcing seems like a clear opportunity to add value with minimal additional cost.

A case in point: why on earth don’t e-book publishers work with Amazon, Google, and Apple (and other e-book reader providers) to enable crowd-sourced identification of typos in e-books? You’ve got a large (and growing) audience already engaged in reading the books, and they’re finding the typos anyway. I’m pretty sure that if you made it easy for people to flag typos while reading that they’d gladly do it (provided the publishers then fixed the identified typos and provided the improved versions to people who’d purchased the books). You could get tons of data very cheaply, and actual typos and reader mistakes (reports of non-typos) should be clearly distinguishable by the volume of reports.

In fact, I can only think of two reasons this hasn’t already been done. One, no one’s thought of it yet. Which I doubt. But even if they have, here’s the idea! Now please, pick it up and use it! Two, publishers aren’t sufficiently interested in providing quality e-books. Which frankly I could see; in some cases it seems like they can’t even be bothered to spell check the e-books they produce (which should be trivial). I wonder if one of the reader providers could then fill the gap themselves, or whether they’d be risking copyright infringement for providing modified copies of published (but flawed) works. Probably.

Anybody got contacts in either the publishing or e-book reader/store worlds that can provide insight into why we don’t already have such functionality?

From → Books, Musings, Research

  1. John Regehr permalink

    It’s not that they’re not interested so much as there’s little or no added profit to be had by producing products with fewer typos. Also I think you underestimate the cost of applying crowdsourced fixes; English has plenty of dialects and community-specific conventions; the noise could easily swamp the signal. Wikipedia seems to be rife with little editing battles, do we really need more of that?

  2. Jeff permalink

    I’m not sure I buy that the noise would swamp the signal; I suspect what you’d see is the low-hanging fruit pop out (the misplaced period or missing word that everyone spots), and then regionalisms or mistaken reports fade into the noise.

    I do, however, totally buy that publishers might not think the return would be sufficient (they’ve obviously already decided they they can live with the existing rate of typos in e-books, at least given the costs of applying whatever tools they currently have in place to find them a priori). At which point I wonder if those with e-book stores could offer to take on the work for publishers as a way to differentiate their offerings from other e-books stores…

  3. Jeff,

    I just came across your article and I was thinking the same thing. I just got a book off of Amazon that suffers from OCR scanning. The worst part of the typo, apart from looking unprofessional, is that it detracts from the immersion of a good story. The Kindle shows popularly-highlighted sections of text–why not the ability to flag typos/grammatical mistakes?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: