Posts Tagged ‘National Library of Australia’

One noteworthy factor about several of the digital preservation initiatives I’m visiting during my Churchill Fellowship is how each approach is underpinned by a certain philosophical world view.

For NLA, a key challenge for the digital preservation community is sustainability:

  • The community needs to know as much about routes which haven’t worked as those which have.
  • How do the parts of the preservation puzzle fit together?  Which parts of the puzzle have still to be solved?  How do we co-ordinate the game?
  • Could we make better use of informal knowledge from enthusiasts?  We should recognise that we can’t be experts in everything (and that we can’t preserve everything – a principle most archivists should be happy enough with).
  • Perhaps we are better at digital preservation than we think we are, but merely lack confidence in presenting this to management.

Read Full Post »

In my previous post, I’ve recommended working with depositors to explain the issues of digital preservation and to suggest simple steps for creating and curating digital records with a view to their long-term preservability.

I guess it would be correct to say though that many local archives staff do not feel confident in giving such advice. Although many UK local archives have been involved in digitisation programmes, much of this work has been outsourced and the funding has rarely extended to longer-term preservation of the digital assets created. In all likelihood, the most pressing digital preservation issue facing most UK local archive services is in fact an ever-growing output of CDs and DVDs from their own digitisation initiatives.

The NLA’s Digitisation of Heritage Materials training course is designed to help organisations with very limited resources design and run an in-house digitisation programme, using free or inexpensive software and hardware. It will be of interest to many UK local authority archive services for just these reasons. However, because it also covers sustainability issues – image file formats suitable for long-term preservation, data storage and backup, legal issues etc. – the course might equally well serve as an introduction to the digital preservation of images. It even includes some free software.

Well worth a look.

Read Full Post »

National Library of Australia

National Library of Australia

The National Library of Australia (NLA) began their web archiving project, PANDORA, in 1996, and the current team consists of four members of staff. The NLA’s web archiving programme is selective, contrasting with approaches in the Scandinavian countries in particular where the aim has been to harvest the entirety of the country’s web domain. The decision to make selective harvests only was resource driven, since there was no extra funding available, although the Library are now doing periodic .au domain harvests in conjunction with the Internet Archive. .au domain harvests have been commissioned since 2005, and the 4th harvest is due this year. It will run over 4 weeks initially, and capture an anticipated 1 billion files, comprising around 40 TB of data. The Internet Archive manage the harvest and carry out a full text index; the results will be shopped to NLA and maintained on NLA servers, although copies will also be available via the Wayback Machine (without the full text indexing).

The terminology ‘PANDORA Archive’ is acknowledged to cause some confusion, particularly within the Australian government, and the Library acknowledge that they are not in fact carrying out an archive role in the traditional sense. Rather, PANDORA is a web collection, a snapshot of a point in time, a representation of what the NLA feels is important in the Australian web domain. PANDORA doesn’t meet recordkeeping needs for recording business transactions; the websites are harvested purely for their content and there is some leeway in the accuracy of dates of collection – for example, a site will be timestamped when it is harvested, but the NLA then perform further quality controls on the harvested site which may take up to a week to complete.

That said, the websites are harvested with the intention that the NLA will attempt to keep them in perpetuity, and permission is sought from website publishers for collection, preservation and public access, with this in mind. Legal deposit legislation does not (yet) cover electronic information in Australia, and considerable effort is therefore required in obtaining the relevant permissions from the website publisher (but not from every contributor). Access restrictions are applied in certain circumstances – for instance, where there is a commercial interest involved. Access can be restricted in several different ways – (a) for a set period of time following archiving, (b) for specific dates, (c) by use of authenticated logins, (d) access restricted to one PC in the NLA’s reading room. One of the problems NLA identify with their current harvesting software is that the restriction mechanism is not sufficiently finely tuned to file level – currently access restrictions can only be specified on a whole website.

The selection guidelines used are under review at the moment. The current priorities include major events (for example, coverage of Australian elections) but can basically cover any original, high quality content not available in print. The websites harvested range from academic e-journals to blogs. PANDORA is just moving into Web 2.0 harvesting, although they have already captured many blogs, some MySpace pages and some online video.

A PANDORA ‘title’ might be anything from a single PDF document to a whole or part website. A particular website might also be harvested at scheduled intervals, how long between captures depending on how regularly the site is updated, whether content is periodically removed as well as new content added, and the general stability of the organisation publishing the website. The harvest interval is re-assessed at each harvest. Currently the most frequent periods to harvest are between 6 months and 1 year. Organisationally, it is more efficient to carry out captures less frequently.

The PANDORA archive currently holds around 2TB of data, consisting of around 20,000 titles and 40,000 harvested instances.

Of most interest vis-a-vis local archive services in the UK, PANDORA has nine partners in State Libraries and other cultural organisations, who can define what they require to be collected via a web browser interface to PANDORA’s in-house harvesting tool, PANDAS. Librarians in partner institutions can also log in to fix minor problems with harvests or log more significant issues for the team at NLA to resolve. Most of the actual capture work, however, is carried out by the team at NLA.

Whilst the PANDORA team has a library background, it is noted that a certain level of technical skills are required. That said, other than the quality control work carried out on each harvested title, little post-processing is currently carried out specifically to promote the longevity of the stored files. 3 copies are created – a preservation master (the original files as harvested), a display master (which includes any quality control changes), and a metadata master. A display copy is then generated from the display master.

Read Full Post »

A couple of articles in the most recent edition of the International Journal of Digital Curation caught my eye this week as I prepare for my forthcoming Winston Churchill Memorial Fellowship to Australia and the US.

Martha Anderson reviews the evolution of the National Digital Information Infrastructure and Preservation Program initiated by the Library of Congress, and draws some conclusions about lessons learned, many of which will be familiar to those of us working within existing partnership organisations, such as West Yorkshire Joint Services. The layered stewardship model introduced in the paper is nevertheless a useful concept to bear in mind as the UK archive sector begins to build our own national network of diverse stakeholders to tackle the digital preservation challenge. The full paper is available at http://www.ijdc.net/ijdc/article/view/59/60.

David Pearson and Colin Webb discuss issues of file format obsolescence and introduce the AONS II Project, something I hope to find out more about when I visit the National Library of Australia in September. The project aimed to develop a software tool that would find and report indicators of obsolescence risks. It will be interesting to see how this works fits with European Planets Project and their PLATO preservation planning tool. The IJDC paper can be found at http://www.ijdc.net/ijdc/article/view/76/78.

I see more papers have appeared on the PeDALS project website in Arizona too – plenty of reading to get through…

Read Full Post »