In all the debate about Google’s approach to digitization, I haven’t seen much discussion of the quality of the results, though people do talk some about interface issues (the Open Content Alliance design is a lot better for readability and use, whatever one thinks of the philosophical issues involved).
Anyway, I raise this because I’m curious whether it’s just me, or everyone has noticed this and has already discussed it, but of the texts I’ve looked at recently in Google Books, quite a few of them seem sloppy: lines at the bottom of the page distorted or unreadable, half-pages missing, weird noise or distortion. It’s kind of charming that some scans come from library texts that have marginal notations by generations of students, on the other hand.
The Little Professor mentions this periodically.
Although I have rarely come accross it, I enjoy the marginalia. The bad scanning is, indeed, annoying. Mostly, though, the feeling it brings out when I come accross it, which is not very often (and I am an avid user of google books) is that of sympathy. I always picture this tired undergraduate work-study student after hours of scanning every single day. When he or she realizes that they have scanned it wrong they make an assessment of whether or not it is readable. There is little incentive to scanning the same page again. For us, the consumers, it impairs our ability to search for words in the book. On the other hand, we have so many resources available for this search (the index, other words that might be close to the one we are searching for, full quotes that might be on the internet somewhere) that it is hard to complain. I also can’t bring myself to to look a gift horse in the mouth. I can’t think of a tool more useful to my academic life than google books: makes citations easier and more complete, saves trips to the library, gives me access to texts I would never find in the consortium libraries, saves me from carrying around books wherever I go.
Back on the subject of quality. The reason why I don’t come accross much bad scanning might be the publishing date of the books I am looking at. Many are in the 10 years old range. They might have digitized version offered to search engine by the publishers themselves (Lynne-Reiner). If most of the books I needed were badly scanned, I’m sure I’d have less sympathy for the undergraduate scanning it.
bjins
jana
I’ve been intending to write about this problem for years now, starting back in the days of that wretched boondoggle, the Million Book Project(s), but I always end up sputtering in rage. Missing pages aren’t just a promise deferred; they’re a promise possibly permanently killed, given how glibly bureaucrats tend to assume that phyical media are disposable once they’ve been “digitally archived.” I’ve heard that many Google collaborators, at least, are aware of the issue, but I haven’t heard of any solutions being put in place yet. Instead, publicity continues to center around copyright issues or the number of volumes.
There’s actually been quite a bit of discussion about this. But it does seem that Google has recently added some quality control measures.