Archive for the ‘Digital Preservation Networks’ Category

Chris Prom‘s talk on his Fulbright research ‘Tools for implementing Digital Preservation Standards’ for the ‘under resourced’ archive at the Society of Archivists’ Data Standards Group meeting (presentation slides should be available here shortly) yesterday has finally spurred me into posting a roundup of projects which I’ve encountered over the last couple of months, which are specifically relevant to digital preservation in a small archives repository.

When I embarked upon my Churchill Fellowship in 2008, practical implementations of digital preservation research were only occurring in large repositories, usually at a national or sometimes state level.  With the notable exception of the Paradigm project and related work at Oxford University, there had been few attempts to scale down the large programmes, or to package up the various tools available with the products of digital library/repository world, as envisaged by the 2007 UNESCO report Towards an Open Source Archival Repository and Preservation System.  The smaller programmes I did visit were generally concentrating on a niche subset of digital archives (for example, email or web archives).

Dedicated followers of digital preservation issues are probably already aware of the RODA repository created on a Fedora base by the Portuguese National Archives, and may have read this review of the demo site from another UK local archivist.  Chris Prom is now embarking on a more formal assessment, and his blog postings on RODA (and the evaluation criteria he is using) make for worthwhile reading.  RODA is likely to be of particularly interest to UK-based archivists who use the collections management software package, CALM, since this is also in use at the Portuguese National Archives, although there doesn’t seem to have been any attempt to date to link the two together.  What happens with a hybrid accession? is the obvious question.

Chris also introduced yesterday’s audience to a new project, Archivematica, which is packaging already available open source preservation tools into a Linux Ubuntu-based virtual appliance.  As the project’s wiki explains, ‘This means an entire suite of digital preservation tools is now available to the average archivist from one simple installation’.  This is a really exciting development and I am looking forward to seeing the results of Chris’s evaluation.  Archivematica is developed by the same Canadian team, Artefactual Systems, who are behind the ICA-Atom archival description software commissioned by the International Council on Archives.

Closer to home, since I am involved on the board for one of the projects, it is remiss of me not to have mentioned before on this blog the digital curation work going on at Gloucestershire Archives, although the website itself has only been made available relatively recently.  This work is the first real attempt to develop a practical digital curation architecture in a UK local authority archives setting (as opposed to simple re-use of existing tools, piecemeal).  Plenty to explore here.

And finally, on a less technical level, but nevertheless, I think, an important development.  At the sixth of the Society of Archivists’ roadshows in December 2009, I was delighted to hear of Kevin Bolton‘s work in drawing up simple accessioning checklists for digital archives at Manchester Archives and Local Studies, and – most importantly – how these are being developed regionally for the North West, in conjunction with Cheshire Archives and Local Studies.  Particularly at this time of economic recession (or are we supposed to be out of that now?) I believe it is vital that smaller archives pool their resources and work in partnership to find solutions to digital archives issues, and it is good to see a framework for the future being mapped out here in the North West.

Read Full Post »

Some exciting news today –  the West Yorkshire Archive Service [WYAS] submission to the InterPares 3 Research Project for a case study of the MLA Yorkshire archives has been accepted.  MLA Yorkshire, the lead strategic agency for museums, libraries and archives in the region, closes this week (so that live website might not be available for too much longer! – In fact, I’ve been experimenting with the Internet Archives’ Archive-It package as part of the MLA Yorkshire archives work) as part of a national restructuring of the wider organisation, and I’ve spent much of the past few days arranging the transfer of both paper and digital archives from the local office in Leeds. 

InterPares 3 focuses on implementing the theory of digital preservation in small and medium-sized archives, and should provide an excellent chance for WYAS to build up in-house digital preservation expertise as we feel our way with this, our first large-scale digital deposit.  I’m really excited about this opportunity, and I hope to document how we get on with the project on this blog.

Read Full Post »

Presentations from the successful open consultation day held at TNA on 12 November on digital preservation for local authority archivists are now available on the DPC website – including my report on my Churchill Fellowship research in the US and Australia.  Also featured were colleagues from other local authority services already active in practical digital preservation initiatives – Heather Needham on ingest work at Hampshire, Viv Cothey reporting on his GAIP tool developed for Gloucestershire Archives, and Kevin Bolton on web archiving work at Manchester City. 

Heather and I also reported back on the results of the digital preservation survey of local authorities and a copy of the interim report is also now available on the DPC site.   A paper incorporating the discussion arising from the survey, from the afternoon sessions of the consultation event, will be published in Ariadne in January 2009.

Read Full Post »

Lots of interesting work going on at North Carolina State Archives – plenty to read on their electronic records page. One project I’d particularly like to highlight is their work on the preservation of e-mail.

E-mail seems to be one of those types of electronic record about which there’s been lots and lots of discussion about how difficult it is to preserve, but not so much (at least that I knew of) in the way of practical advice of how you might go about attempting to keep it.

As well as the very practical guidelines for users, and suggested retention periods for e-mail, staff in the North Carolina State Archives Government Records Branch have been working on a collaborative project to transform e-mail from its native format into XML for preservation. The catalyst for this project was the deposit of e-mail messages from a former North Carolina governor and his staff. The website for the e-mail project has a full set of documentation, and links to other e-mail preservation initiatives. More recently, North Carolina has been working with the Collaborative Electronic Records Project (CERP) at the Smithsonian Institution Archives and the Rockefeller Archive Center, and an XML schema for a single e-mail account has now been published.

I have also visited the Smithsonian Institution Archives, who have also developed some automated tools to help with the processing of e-mail archives, which they hope to make available on their website in due course. The CERP Project will be of particular interest to UK local archives, since this work has been achieved with an emphasis on low-cost solutions suitable for small and medium-sized organisations.

Read Full Post »

An ongoing sub-theme of my Fellowship has been to look at where success in digital preservation has come by means of collaborative partnerships, and to investigate how communities of shared practice can be built up and best practice ideas exchanged.

The Best Practice Exchanges take place annually, and provide a forum for those working on digital information management initiatives in US State government to meet and discuss issues, challenges and potential solutions.  All of the State Archives I’ve visited on my Fellowship have participated at one time or another.  The Exchanges are hosted and organised by volunteer States on a cost recovery basis.  The sessions are run more informally than a traditional conference, with a facilitator to encourage discussion in small groups.

Lots of ideas here for future training workshops in the UK?

Read Full Post »

Visiting Arizona was a useful way of pulling together many of the strands of what I’ve learnt so far. I was particularly interested in the Persistent Digital Archives and Library System (PeDALS) project, which aims to create an automated workflow for processing digital collections, but also to keep costs as low as possible in an effort to reduce the barriers to addressing the challenges of digital preservation.

The automation aim is of course shared with another of the State Government NDIIPP projects at Washington State Digital Archives, and there are indeed some conceptual similarities in the workflow. However, PeDALS also makes use of a LOCKSS (Lots of Copies Keeps Stuff Safe) private network to provide inexpensive storage with plenty of redundancy and automatic error detection and correction. Having visited the LOCKSS team earlier in my Fellowship, I was curious to see how this system (originally designed to enable libraries to collect and preserve locally materials published on the internet) could be implemented in an archival context.

The envisaged workflow for PeDALS works best when there are clear series of records – in other words, it should work pretty well for government record series, but less well for miscellaneous private and personal accessions.  This is because the system is based upon the application to systematic ‘business rules’ to process large sets of similar records in the most efficient way possible.  This programming work could only be justified where there are sufficient records of a similar type, being created as the result of a routine process.  As has become something of a theme in most of the operational digital archives I have visited, the PeDALS team originally intended to focus on born-digital records but has found that many routine processes are still embedded in a paper system, and hence is currently working primarily with digital records.

The current phase of the collaborative, inter-State NDIIPP PeDALS project is looking at writing these business rules and setting up the PeDALS workflow and storage systems.  Without going into all of this in a tremendous amount of detail (I’d suggest a look at the PeDALS website for further details), the basic idea is to write the rules once and then allow individual participants in the network to tweak them to suit their local circumstances.

Whilst very much in the early stages of building the system, the project is definitely work colleagues in the UK local archives network keeping an eye on – not least because of the emphasis on keeping costs down.  As well as the main project website, there is an update log at https://pedals.updatelog.com/login (you need to register for a username and password).

Read Full Post »

I hope many archivists working in local authority services in the UK have completed the survey on digital preservation which is currently running.  The  results of the survey will be fed into an open consultation event to be held at The National Archives on 12 November 2008.  Those of us who have been working on the survey and event planning are hoping that this will provide a first step towards a new alliance of interested organisations in the UK to co-ordinate action on digital preservation.

Throughout my Fellowship, I will be encountering examples of successful partnerships which have attempted to address the challenges of digital preservation.  In Australia, I was particularly interested in two partnerships – the Australasian Digital Recordkeeping Initiative (ADRI) and the Australian Partnership for Sustainable Repositories (APSR).

ADRI is the partnership most immediately applicable to the local authority sector in the UK, as it is an initiative formed solely of public record keeping authorities across Australia and New Zealand.  Both initiatives, however, identified similar strengths and aims:

  • enabling information sharing on best practice
  • offering encouragement, support and reassurance to practitioners (archivists, librarians) and external stakeholders (eg record creators, users, government) alike
  • identifying areas of joint interest
  • providing a framework for recognition of partners’ work on new models and paradigms for digital preservation (eg testbed software solutions, model business cases, proposed standards for digital preservation)

Both projects also rely heavily on practical contributions from their member organisations, yet emphasise that these are generally projects which the members would be commited to doing anyway.  The benefit to the community comes from pooling these resources towards a common Australasian approach to digital preservation and access within their respective communities (public records bodies in the case of ADRI; University Libraries in the case of APSR).

Read Full Post »

National Library of Australia

National Library of Australia

The National Library of Australia (NLA) began their web archiving project, PANDORA, in 1996, and the current team consists of four members of staff. The NLA’s web archiving programme is selective, contrasting with approaches in the Scandinavian countries in particular where the aim has been to harvest the entirety of the country’s web domain. The decision to make selective harvests only was resource driven, since there was no extra funding available, although the Library are now doing periodic .au domain harvests in conjunction with the Internet Archive. .au domain harvests have been commissioned since 2005, and the 4th harvest is due this year. It will run over 4 weeks initially, and capture an anticipated 1 billion files, comprising around 40 TB of data. The Internet Archive manage the harvest and carry out a full text index; the results will be shopped to NLA and maintained on NLA servers, although copies will also be available via the Wayback Machine (without the full text indexing).

The terminology ‘PANDORA Archive’ is acknowledged to cause some confusion, particularly within the Australian government, and the Library acknowledge that they are not in fact carrying out an archive role in the traditional sense. Rather, PANDORA is a web collection, a snapshot of a point in time, a representation of what the NLA feels is important in the Australian web domain. PANDORA doesn’t meet recordkeeping needs for recording business transactions; the websites are harvested purely for their content and there is some leeway in the accuracy of dates of collection – for example, a site will be timestamped when it is harvested, but the NLA then perform further quality controls on the harvested site which may take up to a week to complete.

That said, the websites are harvested with the intention that the NLA will attempt to keep them in perpetuity, and permission is sought from website publishers for collection, preservation and public access, with this in mind. Legal deposit legislation does not (yet) cover electronic information in Australia, and considerable effort is therefore required in obtaining the relevant permissions from the website publisher (but not from every contributor). Access restrictions are applied in certain circumstances – for instance, where there is a commercial interest involved. Access can be restricted in several different ways – (a) for a set period of time following archiving, (b) for specific dates, (c) by use of authenticated logins, (d) access restricted to one PC in the NLA’s reading room. One of the problems NLA identify with their current harvesting software is that the restriction mechanism is not sufficiently finely tuned to file level – currently access restrictions can only be specified on a whole website.

The selection guidelines used are under review at the moment. The current priorities include major events (for example, coverage of Australian elections) but can basically cover any original, high quality content not available in print. The websites harvested range from academic e-journals to blogs. PANDORA is just moving into Web 2.0 harvesting, although they have already captured many blogs, some MySpace pages and some online video.

A PANDORA ‘title’ might be anything from a single PDF document to a whole or part website. A particular website might also be harvested at scheduled intervals, how long between captures depending on how regularly the site is updated, whether content is periodically removed as well as new content added, and the general stability of the organisation publishing the website. The harvest interval is re-assessed at each harvest. Currently the most frequent periods to harvest are between 6 months and 1 year. Organisationally, it is more efficient to carry out captures less frequently.

The PANDORA archive currently holds around 2TB of data, consisting of around 20,000 titles and 40,000 harvested instances.

Of most interest vis-a-vis local archive services in the UK, PANDORA has nine partners in State Libraries and other cultural organisations, who can define what they require to be collected via a web browser interface to PANDORA’s in-house harvesting tool, PANDAS. Librarians in partner institutions can also log in to fix minor problems with harvests or log more significant issues for the team at NLA to resolve. Most of the actual capture work, however, is carried out by the team at NLA.

Whilst the PANDORA team has a library background, it is noted that a certain level of technical skills are required. That said, other than the quality control work carried out on each harvested title, little post-processing is currently carried out specifically to promote the longevity of the stored files. 3 copies are created – a preservation master (the original files as harvested), a display master (which includes any quality control changes), and a metadata master. A display copy is then generated from the display master.

Read Full Post »

A couple of articles in the most recent edition of the International Journal of Digital Curation caught my eye this week as I prepare for my forthcoming Winston Churchill Memorial Fellowship to Australia and the US.

Martha Anderson reviews the evolution of the National Digital Information Infrastructure and Preservation Program initiated by the Library of Congress, and draws some conclusions about lessons learned, many of which will be familiar to those of us working within existing partnership organisations, such as West Yorkshire Joint Services. The layered stewardship model introduced in the paper is nevertheless a useful concept to bear in mind as the UK archive sector begins to build our own national network of diverse stakeholders to tackle the digital preservation challenge. The full paper is available at http://www.ijdc.net/ijdc/article/view/59/60.

David Pearson and Colin Webb discuss issues of file format obsolescence and introduce the AONS II Project, something I hope to find out more about when I visit the National Library of Australia in September. The project aimed to develop a software tool that would find and report indicators of obsolescence risks. It will be interesting to see how this works fits with European Planets Project and their PLATO preservation planning tool. The IJDC paper can be found at http://www.ijdc.net/ijdc/article/view/76/78.

I see more papers have appeared on the PeDALS project website in Arizona too – plenty of reading to get through…

Read Full Post »

Older Posts »