Digitization: a remarkable success

Last week’s work to scan, edit, OCR, and describe basic metadata for documents was a genuine success. Although it began fitfully with some technical hiccups – as do all attempts to use technology with as much complexity as we did – I was exceptionally pleased by the outcome. In about 30 minutes of work we scanned more than 120 documents, described 31 of them with metadata, and edited and checked the OCR quality of a small, but valuable collection. This may seem like not that much work, but remember, none of us, including me as the instructor, has ever created or worked within this kind of digital data pipeline.  None of you had any depth of experience with these technologies or these practices and yet we were able to work efficiently from the beginning.

Just as a note, I raced to scan the rest of the documents before having to return the box to special collections that same afternoon. I noticed that perhaps 50 documents we captured with a camera not in focus. These will need to be redone.

To me this is a perfect example of a digital humanities project. It was “humanities” in that we were interested in researching the primary documents relating to the Old Chapel building at UMass. Although we hardly dealt the with content of these documents at all, this would not be much different in a non-digital study. Taking a census of your materials, sorting them into kinds, and making a rough catalog or those materials would be a natural first step to any humanist confronted with a box of over 400 documents. The “digital” component of our work is just as obvious, though its impact might not be. With seven computers all networked together, we were able to scan the documents and create image files that were both delivered almost instantly to the editing team and parceled out among the four metadata teams to be described. Armed only with minimal instruction and a handful of embedded notes, these metadata teams were able to all work simultaneously within a single document living “on the cloud”,  specifically in a Google Spreadsheet. The efficiently  this pipeline created is remarkable. An individual using a flatbed scanner would have taken about 20 hours just to scan the files. It took us (including my time and class time) less than three hours. The same value of efficiency can be said about the organization of information. How much time would it take an individual just to rename all the scanned images?

This is the DH experience: as a team of amateurs we worked at once independently and collaboratively on a set of historical documents, capturing their content and preparing (via metadata) to share it with the world and to preserve it for the future. We did all that in half an hour.

Finally, as you all saw, there was the delightful surprise of physical artifacts as well as paper records in the box of materials from special collections. I also photographed these objects and created a three dimensional model of one of them. Click below to see it in two different views, as a shaded solid (top) and in “X-ray”.

Chisel (Click to view in 3D)

Chisel (Click to view in 3D)

PS – for those who care, this is the polariod picture reference.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s