Feeds:
Posts
Comments

Posts Tagged ‘National Archives of Australia’

It’s been a while since I’ve posted here purely on digital preservation issues: my work has moved in other directions, although I did attend a number of the digital preservation sessions at the Society of American Archivists’ conference this summer.  I retain a keen interest in digital preservation, however, particularly in developments which might be useful for smaller archives.  Recently, I’ve been engaged in a little work for a project called DiSARM (Digital Scenarios for Archives and Records Management), preparing some teaching materials for the Masters students at UCL to work from next term, and in revising the contents of a guest lecture I present to the University of Liverpool MARM students on ‘Digital Preservation for the Small Repository’.  Consequently, I’ve been trying to catch up on the last couple of years (since I left West Yorkshire Archive Service at the end of 2009) of new digital preservation projects and research.

So what’s new?  Well, from a small archives perspective, I think the key development has been the emergence of several digital curation workflow management systems – Archivematica, Curator’s Workbench, the National Archive of Australia’s Digital Preservation Software Platform (others…?) – which package together a number of different tools to guide the archivist through a sequenced set of stages for the processing of digital content.  The currently available systems vary in their approaches to preservation, comprehensiveness, and levels of maturity, but represent a major step forward from the situation just a couple of years ago.  In 2008, if (like me when WYAS took in the MLA Yorkshire archive as a testbed), you didn’t have much (or any) money available, your only option was – as one of the former Liverpool students memorably pointed out to me – to cobble together a set of tools as best you could from old socks and a bit of string.  Now we have several offerings approaching an integrated software solution; moreover, these packages are generally open source and freely available, so would-be adopters are able to download each one and play about with it before deciding which one might suit them best.

Having said that, I still think it is important that students (and practitioners, of course) understand the preservation strategies and assumptions underlying each software suite.  When we learn how to catalogue archives, we are not trained merely to use a particular software tool.  Rather, we are taught the principles of archival description, and then we move on to see how these concepts are implemented in practice in EAD or by using specific database applications, such as (in the U.K.) CALM or Adlib.  For DiSARM, students will design a workflow and attempt to process a small sample set of digital documents using their choice of one or more of the currently available preservation tools, which they will be expected to download and install themselves.  This Do-It-Yourself approach will mirror the practical reality in many small archives, where the (frequently lone) archivist often has little access to professional IT support. Similarly, students at UCL are not permitted to install software onto the university network.  Rather than see this as a barrier, again I prefer to treat this situation a reflection of organisational reality.  There are a number of very good reasons why you would not want to process digital archives directly onto your organisation’s internal network, and recycling re-purposing old computer equipment of varying technical specifications and capabilities to serve as workstations for ingest is a fact of life even, it seems, for Mellon-funded projects!

In preparation for writing this DiSARM task, I began to put together for my own reference a spreadsheet listing all the applications I could think of, or have heard referenced recently, which might be useful for preservation processing tasks in small archives.  I set out to record:

  • the version number of the latest (stable) release
  • the licence arrangements for each tool
  • the URL from which the software can be downloaded
  • basic system requirements (essentially the platform(s) on which the software can be run – we have surveyed the class and know there is a broad range of operating systems in use, including several flavours of both Linux and Windows, and Mac OS X)
  • location of further documentation for each application
  • end-user support availability (forums or mailing lists etc)
This all proved surprisingly difficult.  I was half expecting that user-friendly documentation and (especially) support might often be lacking in the smaller projects, but several websites also lack clear statements about system requirements or the legal conditions under which the software may be installed and used.  Does ‘educational use and research’ cover a local authority archives providing research services to the general public (including academics)?  Probably not, but it would presumably allow for use in a university archives.  Thanks to the wonders of interpreted programming languages (mostly Java, but Python also puts in an occasional appearance), many tools are effectively cross-platform, but it is astonishing how many projects fail clearly to say so.  This is self-evident to a developer, of course, but not at all obvious to an archivist, who will probably be worried about bringing coffee into the repository, let alone a reptile.  Oh, and if you expect your software to be compiled from code, or require sundry other faffing around at a command line before use, I’m sorry, but your application is not “easy to implement” for ordinary mortals, as more than one site claimed.  Is it really so hard to generate binary executables for common operating systems (or if you have a good excuse – such as Archivematica which is still in alpha development – at least provide detailed step-by-step instructions)?  Many projects of course make use of SourceForge to host code, but use another website for documentation and updates – it can be quite confusing finding your way around.  The veritable ClamAV seems to have undergone some kind of Windows conversion, and although I’m sure that Unix packages must be there somewhere, I’m damned if I could find them easily…

All of which plays into a wider debate about just how far the modern archivist’s digital skills ought to reach (there are many other versions of this debate, the one linked – from 2006 so now quite old – just happens to be one of the most comprehensive attempts to define a required digital skill set for information practitioners).  No doubt there will be readers of this post who believe that archivists shouldn’t be dabbling in this sort of stuff at all, especially if s/he also works for an organisation which lacks the resources to establish a reliable infrastructure for a trusted digital repository.  And certainly I’ve been wondering lately whether some kind of archivists’ equivalent of The Programming Historian would be welcome or useful, teaching basic coding tailored to common tasks that an archivist might need to carry out.  But essentially, I don’t subscribe to the view that all archivists need to re-train as computer scientists or IT professionals.  Of course, these skills are still needed (obviously!) within the digital preservation community, but to drive a car I don’t need to be a mechanic or have a deep understanding of transport infrastructure.  Digital preservation needs to open up spaces around the periphery of the community where newcomers can experiment and learn, otherwise it will become an increasingly closed and ultimately moribund endeavour.

Read Full Post »

Day 3 of ECDL started for me with the Query Log Analysis session.  I thought perhaps that, now the papers were getting heavily into IR technicalities, I might not understand what was being presented or that it would be less relevant to archives.  How wrong can you be!  Well, ok, IR metrics are complex, especially for someone new to the field, but when the first presentation was based upon a usability study of the EAD finding aids at the Nationaal Archief (the National Archives of the Netherlands), it wasn’t too difficult to spot the relevance.  In fact, it was interesting to see how you notice things when the test data is presented in a foreign language, that you wouldn’t necessarily observe if they were in your mother tongue.  In the case of the Nationaal Archief, I was horrified at how many clicks were required to reach an item description.  Most archives have this problem with web-based finding aids (unless they merely replicate a traditional format, for instance, a PDF copy of a paper list), but somehow it was so much more obvious when I wasn’t quite sure exactly what was being presented to me at each stage of the results.  This is what it must be like to be an archival novice.  No wonder they give up.

The second paper of the morning, Determining Time of Queries for Re-ranking Search Results, was also very pertinent to searching in an archival context.  It discussed ‘temporal documents’ where either the terminology itself has changed over time or time is highly relevant to the query.  This temporal intent may be either implicit or explicit in the query.  For example, ‘tsunami + Thailand’ is likely to refer to the 2004 tsunami.  These kinds of issues are obviously very important for historians, and for archivists making temporal collections available in a web environment, such as web archives and online archival finding aids.

Later in the morning, I was down to attend the stream on Domain-specific Digital Libraries.  One of these specific domains turned out to be archives, with an (appropriately) very philosophical paper presented by Pierre-Edouard Portier about DINAH [in French].  This is “a philological platform for the construction of multi-structured documents”, created to enable the transcription and annotation of the papes of the French philosopher, Jean-Toussaint Desanti, and to facilitate the visualization of the trace of user activities.  My tweeting of this paper (limited on account of both the presentation’s intellectual and technical complexity and the fact that I’d got to bed at around 3am that morning!) seemed to catch the attention of both the archival profession and the Linked Data community;  it certainly deserves some further coverage in the English-speaking archival professional literature.

In the same session, I was also interested in the visualization techniques presented for time-oriented scientific data by Jürgen Bernard, which reminded me of The Visible Archive research project funded by the National Archives of Australia.  The principle – that visual presentations are a useful, possibly preferable, alternative to text-based descriptions of huge series of data – is the same in both cases.  Similarly, the PROBADO project has investigated the development of tools to store and retrieve complex, non-textual data and objects, such as 3D CAD drawings and music.  There were important implications from all of these papers for the future development of archival finding aids.

In the afternoon, I found myself helping out at the Networked Knowledge Organization Systems/Services (NKOS) workshop.  I wasn’t really sure what this entailed, but it turned out to involve things like thesauri construction and semantic mapping between systems, all of which is very relevant to the UK Archives Discovery (UKAD) Network objectives.  I was particularly sorry I was unable to make the Friday session of the workshop, which was to be all about user-centred knowledge system design, and Linked Data, however the slides are all available with the programme for the workshop.

Once again, my sincere thanks to the conference organisers for my opportunity to participate in ECDL2010.  The conference proceedings are available from Springer, for those who want to follow up further, and presentation slides are gradually appearing on the conference website.

Read Full Post »