Posts Tagged ‘Archivematica’

It’s been a while since I’ve posted here purely on digital preservation issues: my work has moved in other directions, although I did attend a number of the digital preservation sessions at the Society of American Archivists’ conference this summer.  I retain a keen interest in digital preservation, however, particularly in developments which might be useful for smaller archives.  Recently, I’ve been engaged in a little work for a project called DiSARM (Digital Scenarios for Archives and Records Management), preparing some teaching materials for the Masters students at UCL to work from next term, and in revising the contents of a guest lecture I present to the University of Liverpool MARM students on ‘Digital Preservation for the Small Repository’.  Consequently, I’ve been trying to catch up on the last couple of years (since I left West Yorkshire Archive Service at the end of 2009) of new digital preservation projects and research.

So what’s new?  Well, from a small archives perspective, I think the key development has been the emergence of several digital curation workflow management systems – Archivematica, Curator’s Workbench, the National Archive of Australia’s Digital Preservation Software Platform (others…?) – which package together a number of different tools to guide the archivist through a sequenced set of stages for the processing of digital content.  The currently available systems vary in their approaches to preservation, comprehensiveness, and levels of maturity, but represent a major step forward from the situation just a couple of years ago.  In 2008, if (like me when WYAS took in the MLA Yorkshire archive as a testbed), you didn’t have much (or any) money available, your only option was – as one of the former Liverpool students memorably pointed out to me – to cobble together a set of tools as best you could from old socks and a bit of string.  Now we have several offerings approaching an integrated software solution; moreover, these packages are generally open source and freely available, so would-be adopters are able to download each one and play about with it before deciding which one might suit them best.

Having said that, I still think it is important that students (and practitioners, of course) understand the preservation strategies and assumptions underlying each software suite.  When we learn how to catalogue archives, we are not trained merely to use a particular software tool.  Rather, we are taught the principles of archival description, and then we move on to see how these concepts are implemented in practice in EAD or by using specific database applications, such as (in the U.K.) CALM or Adlib.  For DiSARM, students will design a workflow and attempt to process a small sample set of digital documents using their choice of one or more of the currently available preservation tools, which they will be expected to download and install themselves.  This Do-It-Yourself approach will mirror the practical reality in many small archives, where the (frequently lone) archivist often has little access to professional IT support. Similarly, students at UCL are not permitted to install software onto the university network.  Rather than see this as a barrier, again I prefer to treat this situation a reflection of organisational reality.  There are a number of very good reasons why you would not want to process digital archives directly onto your organisation’s internal network, and recycling re-purposing old computer equipment of varying technical specifications and capabilities to serve as workstations for ingest is a fact of life even, it seems, for Mellon-funded projects!

In preparation for writing this DiSARM task, I began to put together for my own reference a spreadsheet listing all the applications I could think of, or have heard referenced recently, which might be useful for preservation processing tasks in small archives.  I set out to record:

  • the version number of the latest (stable) release
  • the licence arrangements for each tool
  • the URL from which the software can be downloaded
  • basic system requirements (essentially the platform(s) on which the software can be run – we have surveyed the class and know there is a broad range of operating systems in use, including several flavours of both Linux and Windows, and Mac OS X)
  • location of further documentation for each application
  • end-user support availability (forums or mailing lists etc)
This all proved surprisingly difficult.  I was half expecting that user-friendly documentation and (especially) support might often be lacking in the smaller projects, but several websites also lack clear statements about system requirements or the legal conditions under which the software may be installed and used.  Does ‘educational use and research’ cover a local authority archives providing research services to the general public (including academics)?  Probably not, but it would presumably allow for use in a university archives.  Thanks to the wonders of interpreted programming languages (mostly Java, but Python also puts in an occasional appearance), many tools are effectively cross-platform, but it is astonishing how many projects fail clearly to say so.  This is self-evident to a developer, of course, but not at all obvious to an archivist, who will probably be worried about bringing coffee into the repository, let alone a reptile.  Oh, and if you expect your software to be compiled from code, or require sundry other faffing around at a command line before use, I’m sorry, but your application is not “easy to implement” for ordinary mortals, as more than one site claimed.  Is it really so hard to generate binary executables for common operating systems (or if you have a good excuse – such as Archivematica which is still in alpha development – at least provide detailed step-by-step instructions)?  Many projects of course make use of SourceForge to host code, but use another website for documentation and updates – it can be quite confusing finding your way around.  The veritable ClamAV seems to have undergone some kind of Windows conversion, and although I’m sure that Unix packages must be there somewhere, I’m damned if I could find them easily…

All of which plays into a wider debate about just how far the modern archivist’s digital skills ought to reach (there are many other versions of this debate, the one linked – from 2006 so now quite old – just happens to be one of the most comprehensive attempts to define a required digital skill set for information practitioners).  No doubt there will be readers of this post who believe that archivists shouldn’t be dabbling in this sort of stuff at all, especially if s/he also works for an organisation which lacks the resources to establish a reliable infrastructure for a trusted digital repository.  And certainly I’ve been wondering lately whether some kind of archivists’ equivalent of The Programming Historian would be welcome or useful, teaching basic coding tailored to common tasks that an archivist might need to carry out.  But essentially, I don’t subscribe to the view that all archivists need to re-train as computer scientists or IT professionals.  Of course, these skills are still needed (obviously!) within the digital preservation community, but to drive a car I don’t need to be a mechanic or have a deep understanding of transport infrastructure.  Digital preservation needs to open up spaces around the periphery of the community where newcomers can experiment and learn, otherwise it will become an increasingly closed and ultimately moribund endeavour.

Read Full Post »

Chris Prom‘s talk on his Fulbright research ‘Tools for implementing Digital Preservation Standards’ for the ‘under resourced’ archive at the Society of Archivists’ Data Standards Group meeting (presentation slides should be available here shortly) yesterday has finally spurred me into posting a roundup of projects which I’ve encountered over the last couple of months, which are specifically relevant to digital preservation in a small archives repository.

When I embarked upon my Churchill Fellowship in 2008, practical implementations of digital preservation research were only occurring in large repositories, usually at a national or sometimes state level.  With the notable exception of the Paradigm project and related work at Oxford University, there had been few attempts to scale down the large programmes, or to package up the various tools available with the products of digital library/repository world, as envisaged by the 2007 UNESCO report Towards an Open Source Archival Repository and Preservation System.  The smaller programmes I did visit were generally concentrating on a niche subset of digital archives (for example, email or web archives).

Dedicated followers of digital preservation issues are probably already aware of the RODA repository created on a Fedora base by the Portuguese National Archives, and may have read this review of the demo site from another UK local archivist.  Chris Prom is now embarking on a more formal assessment, and his blog postings on RODA (and the evaluation criteria he is using) make for worthwhile reading.  RODA is likely to be of particularly interest to UK-based archivists who use the collections management software package, CALM, since this is also in use at the Portuguese National Archives, although there doesn’t seem to have been any attempt to date to link the two together.  What happens with a hybrid accession? is the obvious question.

Chris also introduced yesterday’s audience to a new project, Archivematica, which is packaging already available open source preservation tools into a Linux Ubuntu-based virtual appliance.  As the project’s wiki explains, ‘This means an entire suite of digital preservation tools is now available to the average archivist from one simple installation’.  This is a really exciting development and I am looking forward to seeing the results of Chris’s evaluation.  Archivematica is developed by the same Canadian team, Artefactual Systems, who are behind the ICA-Atom archival description software commissioned by the International Council on Archives.

Closer to home, since I am involved on the board for one of the projects, it is remiss of me not to have mentioned before on this blog the digital curation work going on at Gloucestershire Archives, although the website itself has only been made available relatively recently.  This work is the first real attempt to develop a practical digital curation architecture in a UK local authority archives setting (as opposed to simple re-use of existing tools, piecemeal).  Plenty to explore here.

And finally, on a less technical level, but nevertheless, I think, an important development.  At the sixth of the Society of Archivists’ roadshows in December 2009, I was delighted to hear of Kevin Bolton‘s work in drawing up simple accessioning checklists for digital archives at Manchester Archives and Local Studies, and – most importantly – how these are being developed regionally for the North West, in conjunction with Cheshire Archives and Local Studies.  Particularly at this time of economic recession (or are we supposed to be out of that now?) I believe it is vital that smaller archives pool their resources and work in partnership to find solutions to digital archives issues, and it is good to see a framework for the future being mapped out here in the North West.

Read Full Post »