I like this essay by John Jones about search algorithms, which he compares to “mechanical Turk” automatons of the 18th Century.
It’s a point that’s well-understood in some circles and completely not in others. Witness the degree to which users continue to express some preference for couching search queries to Google and Siri in the form of natural-language questions: according to Bo Pang and Ravi Kumar, that tendency seems to be steadily increasing as users become more familiar with the functioning of search engines rather than decreasing. Users sometimes relate to Google as if it were an oracle, a non-human being with its own personality and knowledge.
Understanding search algorithms as Jones describes them means understanding that however you phrase your query, you’re really asking us, not a creature named Google or Siri. It’s not quite garbage in, garbage out, but it is “what the set of all users and producers of online information know in, what the set of all users and producers of online information know out”. The really tricky thing is to understand how extensive use of that process both changes and expands that set: not just that we put more information online, but that information begets information.
When I started research on the content of children’s television for a co-authored book that was published in 1999, I had three principal sources of information to draw upon. First, my memories and my brother’s memories of watching TV. Second, the memories of contemporaries gathered from real-world conversations and in online discussions on Usenet and other early forums. (Hooray for alt.society.generation-x!) Third, published resources of various kinds, both old and new. Online information about children’s television, independent of message board conversation, was fairly sparse.
Only a few years later, Wikipedia, YouTube and so on came into existence, and at the same time, owners of media libraries began to much more comprehensively push their content out the door in various formats. Today if I want to see every episode of Jabberjaw, know every voice actor’s casting on the show, get comprehensive information about its production and broadcasting, the title character’s appearances in other Hanna-Barbera shows, and the lyrics to a song about the show by the band Pain, I can.
The general implications of this shift are constantly, incessantly discussed. But what I’m not so sure we fully appreciate are the specific implications of online information as a mirror of what we know and how knowing what we know is something that we’ve never really known before.
It’s true that there are still many things that people know, many kinds of information, which are not strongly represented in online repositories. It’s also true, as Eli Pariser has eloquently explained, that both the deliberate infrastructure of online information and the unintended practices arising from our collective use of it, is actively excluding or hiding some information through a progressively tighter series of feedback loops. Even if the “filter bubbles” were popped in some fashion, there would be human ways of knowing and interpreting that could never be adequately included in the most capacious digital informational space imaginable.
Those cautions noted, there is still a huge unused potential for generative changes to the nature of knowledge production that requires making the intellectual paradigm shift that Jones describes, to understanding the mirror of online information for what it is and looking closely at the never-before-seen reflection it provides. Just to cite one example that I have harped on so constantly that I’m sure my Swarthmore colleagues are tempted to punch me in the face every time I say it, suppose that every professor in every institution in the United States published every syllabus they taught in a form where the materials for the course (texts, images, films, etc.) were easily stripped and aggregated as metadata.
Suddenly the canon in a particular field of study would not be a matter of folk knowledge within a discipline, or would not be knowledge residing in four or five highly fragmented and proprietary archives (publishers, disciplinary associations, bookstores, etcetera). We’d know at any one moment what professionals in a particular field of study deemed to be the most teachable, useful or authoritative material. We’d know over time how that judgment had changed. We’d know if what scholars represented as authoritative through citations was significantly different from what they chose to teach.
Notice all the things that this knowledge doesn’t resolve in and of itself. It doesn’t tell us what to teach. It doesn’t tell us why or how to teach it. It doesn’t tell us if there’s a very large missing set of materials that professors would prefer to teach but cannot obtain (either out-of-print materials or things which have never been written or created). It doesn’t tell us what students did with this material, or how and whether they learned from it.
What does it tell us, then? It tells us what mirrors always tell us, if we look at them without flinching: the gap between how we look and how we imagine and claim we look. The mirror of information, our multitudinous automaton, shows us hidden depths we’ve never noticed and blemishes we’d rather not see.
Some of what we see makes clear what a mirror will never show us (whip out your Zen koans here: your face before you were born and all that).
Some of what we see puts older just-so stories and tall tales in their place, and that’s no small feat. Think about the way that academics have traditionally represented (and deconstructed) canons to each other. A comprehensive picture of pedagogical usage might surprise us in all sorts of ways, change our sense of what we think our practices are. Yes, with some potential for perverse or unintended effects, as in the case of comprehensively tracking citations and using citations as a metric of scholarly value. But mostly I think it is fantastically generative to be able to put aside a massive swamp of arguments and studies that never get beyond an initial attempt to answer the question of “what is it that people actually do“, whether or not the answer is what we expected it to be. Whether we’re scraping data from World of Warcraft to find out what the distribution of character choice is, compiling the totality of all print publication in world history, or learning what it is that we actually all use in our classrooms, what we see isn’t just the end of some fumbling-in-the-dark, it is the beginning of some more interesting conversations.
The mirror of information clears out the dead brush from the undergrowth. If we know, really know, that some high-culture canons are an infinitesimal fraction of the totality of global cultural production over the last five hundred years, it sharpens our conversation about why that happened, whether we should be studying all of the occluded culture that was lost in the light of a thin crescent of publication or creation, or whether there’s some reason to stay focused largely on that fraction. If we really know what we’re all teaching, what we value in that context of usage, we might have a far clearer view of what we’re trying to accomplish in creating scholarship, of how we read and interpret knowledge, of what works out in usage.
Understanding that search algorithms are a mechanical Turk–that it’s just us hiding inside–is, if we choose to see it as such, another chance to step towards wisdom through self-knowledge.
I find the mechanical Turk analogy confusing for two reasons. First, it understates the importance of technological limitations. Google built a very complicated engine and they have a lot of control over it, and it’s important to know that they use that control for their own business goals rather than our goals as users, but in the end it’s an engine that has to answer each of vast numbers of queries about terabytes of information in a fraction of a second. The human chess player inside the Turk is on equal terms with the human opponent outside, but the search engine is not on equal terms with its user.
And second, it causes a collision in my head because there’s a service run by Amazon called Mechanical Turk, in which humans do small tasks for money: http://aws.amazon.com/mturk/
Wikipedia is a top page hit for many searches. This weakens my fears of a “filter bubble” with respect to search.
This is the top hit for “bank teller feminist”:
Not only do you see the study you were looking for, you get a good discussion of an alternate view on the study.
Easy access to alternate views through wikipedia is pretty common. It is hard to stay in a bubble when wikipedia is a top hit.