Selection, retention and destruction

Nobody knows exactly what data is going to be wanted or needed in the future, but you need to consider what should/has to be kept, and what can or should be deleted or disposed of.

Why select and appraise

It is not possible for all digital data to be kept forever but outside the archive and library communities there is no widespread recognition of the need to select data for curation. Instead there is a view that "storage is cheap so why don’t we just decide to keep everything". While that may in theory be technologically possible in practice there are four main objections to this view:

  1. Digital content expands. And "…if the growth of content (per byte or per object) keeps pace with the declining cost [of storage], then the real cost of keeping everything may actually be the same as it is now, or higher".
  2. Backup and mirroring increases costs. No digital preservation approach can survive without appropriate mirroring and backup systems. This instantly increases the storage cost by at least a factor of two.
  3. Discovery gets harder. Keeping everything means that the noise to signal ratio of searches will be high, requiring additional individual effort to ascertain which data is the intended target of a search.
  4. Managing and preserving is expensive. We must consider the cost of creating and managing preservation metadata, and the cost of preservation actions on data that does need to be retained.

From Appraise and Select Research Data for Curation – DCC and Australian National Data Service

Checklist

  • What data does my funder require me to keep?
  • What data does the university require me to keep?
  • What does legislation require me to keep or destroy (see Data Protection Act principles - “Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes”).
  • Is this data 'vital' to the project or organisation?
  • Do I have the legal and intellectual property rights to keep and re-use this data? If not, can these be negotiated?
  • Is there sufficient documentation to explain the data, and allow the data or record to be found wherever it ends up being stored?
  • If I need to pay to keep the data, can I afford it?
  • Is the data transient/transitory?
  • Ensure that what you keep is well documented – will anyone know what it is?
  • Store, name and organise appropriately – consider for example what will happen when you leave and the data is left for others to manage and use.