I’m still struggling with how to begin a project that I would like to be a lifelong commitment for the rest of my career. The issues are technical and conceptual.
What I want to do is begin publishing and archiving my notations on scholarly (and maybe some non-scholarly) readings, to document my workflow as a reader and thinker. I see this as having several important uses.
First, scholars studying book history and print culture have now thoroughly made me over into a believer in the value of marginalia. The beauty of digitization in this sense is that it allows us to create marginalia without defacing a singular physical copy of a text. On the other hand, the ephemeral character of a lot of digital reading and notation means that much of the marginalia that we might have will never come into being or will be lost.
Second, I think one of the major reputational problems that academia has, particularly humanities scholars, is that much of our workflow is invisible or poorly understood even by sympathetic publics. I’ve been cleaning out a closet here in my office this week and even I’m a bit flabbergasted by the transcript of my working life since 1988: piles and piles of notes and commentary on readings, compilations of library records intended to drive both research and my awareness of my fields of specialization, and a lot of other tracings of a working life. At least some of this might be more useful to me and to others as a tool for inquiry and study if it were searchable and visible, but such records might also help me and other scholars to document and explain what it is that we do.
I’m clear about what it is that I want to do and why I want to do it. What I’m not clear on is how and where.
Here’s the specifications that I want to match:
1. I want to publish and archive these digital marginalia as data in a platform-agnostic form that could be pushed into multiple locations. Say, for example, that I could have a page or location at this blog where they would appear, but I might also publish them through my catalog at LibraryThing and as shared Zotero notes, for example.
2. I would like the marginalia to have a fixed link the notes to the specific bibliographic record of the material that they are based upon, to have that link embedded in the data.
3. I want the baseline marginalia to be machine-readable and available to anyone else who would like to take the data and associate it with relevant catalogs or other archives, under a Creative Commons license that only requires attribution. The attribution I’d actually like to embed into each record so that the data-sharing can be automated. Other users would be free to add their own metadata or cataloging information.
4. I want to create some kind of simple tagging scheme that can support my own folksonomy to help me understand and search the eventual archive of my notes while having the total archive also be searchable in a more open-ended way.
5. I want the archive to be visible to external search engines.
6. I want the archive to have no special or particular support needs or costs beyond the storage space and Internet connectivity required. E.g., nothing that would require serious customization and maintenance by staff besides myself. This is another reason for platform-agnosticism, to minimize the futureward hassles involved in migrating the data to future information infrastructure.
I can see how to do some of these things with my very crude knowledge, but most of what I can think of fails at least one, maybe several, of these conditions, and is in many ways quick and dirty. If I’m going to invest the effort in shifting this workflow from written notes and fragmented data that I have in both analog and digital form, I want to do it in as stable and useful a form as I can manage.
Suggestions, ideas, criticisms all welcome.
There is some devil in the details to be sorted out, but I’m almost certain that Drupal is the tool you’re looking for.
There seem to be four issues here:
1) Storing the data. This is easy. Even a lifetime of work would easily fit into a simple database (flat text probably, definitely MySQL)
2) Entering your data. This seems to be hard, unless you’re spending tons of time converting written notes into digital notes. I’d imagine some kind of plugin infrastructure with various PDF readers might make the most sense.
3) Entering your source data. I don’t think there is any standard format for historical texts, so I have no idea how you’d store exact document references. Probably the simple way is a bibliography of sorts, but then you hit the issue of different versions / formats. Fun.
4) Displaying the data. This would require an API or data feed (for integration into LibraryThing etc.), since you want it to hook into a lot of sites. I’d be inclined to go with Django here. Presumably Drupal could do the same thing. This seems to be a relatively simple problem compared to (2) and (3)
The work you are describing is being addressed at the conceptual level by the W3C and NISO standards orgs. These two efforts are referenced in context in today’s entry about shareable annotations at http://books-on-books.com/2012/08/07/an-e-reader-annotation-mini-manifesto/. Hope that helps.
It occurs to me that this is somewhere Google Glass could excel: OCR everything you read, and suck in / automatically associate all your notes.
I need to think through your requirements more, but piggy-backing off of Eric’s comment, Islandora (Drupal front end with a Fedora back end) may do the trick.
So one clarification–and thanks, this is terrifically useful so far. The primary plan is to cover new notes that I generate “by hand” as I go through a book or article, just as I have before. Except that unlike 15 years ago, as I take notes today, it tends to be typing on a keyboard into a digital application. Most commonly a word processor or sometimes a note-taking application like Notability. I’ve experimented with other note-taking formats–I don’t like Zotero for notes, for example. But the major point is that we don’t have to worry too much about existing, old notes. Those I’ll either try to see if I can’t OCR or just manually type in if I get the time and the inclination. The plan just has to cover new notes that I generate from the point of the project’s beginning forward to the time that I croak and/or stop being capable of generating notes.
Tim, You may have seen the recent Chronicle blog about a group of biologists who collected their handwritten field notes into a published book? Perhaps there is also small window in academic publishing for exactly what you’re describing here?
Probably an obvious question and something you’ve already done, but have you asked your college archivist about this? Many university archives are interested in preserving and enabling access to this kind of work, either within the archives themselves or by advising and assisting personal and local initiatives like yours. They will have expertise in all or most of the aspects of classification, preservation, and publication that you describe.
Tim, this is a great plan. You might find inspiration in a series of pieces by French writers and scholars, notanly Maurice Olender about their working worlds: enter “Le lieu de l’archive” in Worldcat and you’ll find them.
Rachel:
I think actually that this project almost works in tension with some archival practices. First, because I want it to be “born digital” and only incidentally ‘deposited’ or ‘curated’ in some sense. The curation would largely be limited to my entry and tagging of notes, which I expect to be as whimsical and mutable as my habits of mind and labor. The key things for me are to document my workflow, to demonstrate how scholarly knowledge accrues incrementally (this is the problem with our overemphasis on publication: we look only at the end product as proof of scholarly knowledgeability), and to provide that workflow to any other curatorial or informational project that wants to pull in that data and repurpose it. A lot of archives by either habit or conscious design still seem to me to be depositories: documents and data go in, and don’t circulate freely for reuse and reconfiguration.
What I’m hearing is mostly that you want a repository of more-or-less raw documents with some tagging for organization and findability. My take is that Drupal is overkill for that. Drupal is from a web app tradition, and it will work but will take more learning up front and more maintenance over time than necessary. On the other hand, Drupal makes sense if your requirement 1 for pushing the live data to whatever format is key—that’s a quite difficult spec to meet with generality and may well take custom coding, which is natural in the web app tradition.
For a repository I would look first to the business “content management†tradition or to digital library document repository software. The basic description of a document repository is my first sentence, and allowing many ways to access the data is also considered important so there’s always a way to add on different means of access. A good first try might be DSpace: http://www.dspace.org/My take is that Drupal is overkill for that.
http://www.dspace.org/
Given your goals, I’d strongly advise against using one of today’s popular web frameworks as your source of truth. They are relatively complicated beasts, and they will either die or evolve into something completely unrecognizable long before your career is over.
Your best shot at simplicity and longevity is to avoid DBs and stick with flat files in directories. It might be that decades from now, the files-in-directories metaphor will be super-arcane, or even completely break down altogether — but I’d say that’s still your best shot.
For a base note-taking format, consider one of the “lightweight markup languages”, Markdown or reStructuredText. A big advantage here is that even if the entire ecosystem of tools and libraries written around these formats *completely disappears* overnight, you’re not screwed. As long as you can still open the file in 2050, it will still be plainly readable. Markdown is looser and simpler, reStructuredText has extra features that might be useful to you. In a pinch, you could also author in a simple subset of HTML, sticking to very basic elements like paragraphs, lists, and links. HTML is changing a lot at the margins, but the core elements should remain the same for a long time.
Hmmm, I’m not sure what archivists are doing at Swarthmore, but I’m perplexed by your understanding of “archival practices”. The best thinking about collecting the history of educational institutions accommodates exactly the kind of data you’re talking about.
Archives collect unpublished records of enduring value, regardless of format. Born-digital should not be a big deal to a good archivist. I’m not suggesting that you hand over the custody or platform of this data to an archives — after all, archives are records of enduring value (usually unpublished) that are no longer used in the course of daily business.
But, if you’re looking for advice about information systems and long-term digital preservation, I would recommend talking with someone who’s trained in these issues. Many archives are actively aware (and our literature has been for decades) that the record will be impoverished if we only take what happens to end up in a repository — a move toward “documentation strategy” embraces exactly the kind of recordkeeping that you describe. And archives are taking in raw datasets all the time. There’s an extensive body of professional literature that could help address your questions.
Yes, I’m aware that there’s that literature and I’ve read quite a bit of it. In fact, I’m persuaded by some authors writing within that tradition that this project is worth undertaking. But now my questions are rather more technical and specific and less about the broad philosophical point–and as I think you’re seeing in this thread and elsewhere, those questions are harder to get past the starting line so I can begin doing what I want to do. (As some digital humanists say, “less yacking, more hacking”.)
The comments so far are excellent, and I don’t have much to add.
Drupal is both good and too much. If Eric and his crew are making something for multiple Swarthmore faculty and staff, then it could be worth it. Such an app might have broader use and/or collaborative development possibilities.
Have you talked with Dan and the GMU crew? They might have something in the pipeline.
I am an enthusiast Drupal user, but would like to point out another disadvantage. You want a “lifelong” solution and most versions of Drupal has a shelf life of 4-6 years. E.g. version 6 came out early 2008 and will be supported only till version 8 comes out next summer. Version 7 came out last January will be supported till version 9 comes out (probably 2015). Updating from one major version of Drupal to another no is a major project. WordPress, on the other hand, doesn’t suffer from this issue.