Flying high in the (Document) clouds

DocumentCloud is an online document management system for journalists. It provides a way to upload and organize documents, making them easier to share with the public and other team members. In addition, DocumentCloud also provides a set of tools enabling a host of functionality, including the ability to search among all of the uploaded accessible documents. (example from the Washington Post)

Currently, my documents are all uploaded to my own server. They’re easily available, true, but if I were to someday be nibbled to death by ducks, the documents would vanish from the internets. In addition, there’s no functionality available for searching among the documents, and they’re isolated from other comparable documents, making them a little less than useful for generalized researching. Yes, they are accessible via Google, but documents need a bit more.

They need persistence not dependent on a frail human body.

DocumentCloud was kind enough to give me an account, once I demonstrated the intent of my document web site, and how I’ve used my documents as source material for writings. Now I’m in the process of figuring out how best to organize them for uploading, which includes determining what I’m going to name the document files.

The RECAP folks had legitimate criticism of my file name scheme, which consists of folders containing documents named “document1.pdf” and so on. The names are meaningful in context, but meaningless out of context. This is especially true for my court documents.

I could use the RECAP naming scheme, which consists of court system, specific court designation, case number, document number, and attachment number, if any. An example is “gov.uscourts.dcd.129639.1.0.pdf”, which is the US federal court system, the DC court, case number “129639”, document 1, and the main document, not an attachment.

The FOIA Project uses something similar, but it actually spells out the name for the document. An example is “dc-1-2010cv00883-complaint-attachment-1.pdf”. This name breaks down to the DC court system, the year and court type (“2010” and “cv” for Civil), as well as the case number, in addition to the document type and attachment. (As a side note, the FOIA Project also uses DocumentCloud.)

PACER assigns each case a unique PACER number within the system, but that’s only useful when accessing the documents via PACER.

I’ll have to live with whatever I decide, because some of the cases I follow have hundreds, even thousands, of court documents. I’m going to do the “rename and upload to DocumentCloud” thing once.

