Feeds:
Posts
Comments

Posts Tagged ‘Australia’

I hope many archivists working in local authority services in the UK have completed the survey on digital preservation which is currently running.  The  results of the survey will be fed into an open consultation event to be held at The National Archives on 12 November 2008.  Those of us who have been working on the survey and event planning are hoping that this will provide a first step towards a new alliance of interested organisations in the UK to co-ordinate action on digital preservation.

Throughout my Fellowship, I will be encountering examples of successful partnerships which have attempted to address the challenges of digital preservation.  In Australia, I was particularly interested in two partnerships – the Australasian Digital Recordkeeping Initiative (ADRI) and the Australian Partnership for Sustainable Repositories (APSR).

ADRI is the partnership most immediately applicable to the local authority sector in the UK, as it is an initiative formed solely of public record keeping authorities across Australia and New Zealand.  Both initiatives, however, identified similar strengths and aims:

  • enabling information sharing on best practice
  • offering encouragement, support and reassurance to practitioners (archivists, librarians) and external stakeholders (eg record creators, users, government) alike
  • identifying areas of joint interest
  • providing a framework for recognition of partners’ work on new models and paradigms for digital preservation (eg testbed software solutions, model business cases, proposed standards for digital preservation)

Both projects also rely heavily on practical contributions from their member organisations, yet emphasise that these are generally projects which the members would be commited to doing anyway.  The benefit to the community comes from pooling these resources towards a common Australasian approach to digital preservation and access within their respective communities (public records bodies in the case of ADRI; University Libraries in the case of APSR).

Read Full Post »

One noteworthy factor about several of the digital preservation initiatives I’m visiting during my Churchill Fellowship is how each approach is underpinned by a certain philosophical world view.

For NLA, a key challenge for the digital preservation community is sustainability:

  • The community needs to know as much about routes which haven’t worked as those which have.
  • How do the parts of the preservation puzzle fit together?  Which parts of the puzzle have still to be solved?  How do we co-ordinate the game?
  • Could we make better use of informal knowledge from enthusiasts?  We should recognise that we can’t be experts in everything (and that we can’t preserve everything – a principle most archivists should be happy enough with).
  • Perhaps we are better at digital preservation than we think we are, but merely lack confidence in presenting this to management.

Read Full Post »

In my previous post, I’ve recommended working with depositors to explain the issues of digital preservation and to suggest simple steps for creating and curating digital records with a view to their long-term preservability.

I guess it would be correct to say though that many local archives staff do not feel confident in giving such advice. Although many UK local archives have been involved in digitisation programmes, much of this work has been outsourced and the funding has rarely extended to longer-term preservation of the digital assets created. In all likelihood, the most pressing digital preservation issue facing most UK local archive services is in fact an ever-growing output of CDs and DVDs from their own digitisation initiatives.

The NLA’s Digitisation of Heritage Materials training course is designed to help organisations with very limited resources design and run an in-house digitisation programme, using free or inexpensive software and hardware. It will be of interest to many UK local authority archive services for just these reasons. However, because it also covers sustainability issues – image file formats suitable for long-term preservation, data storage and backup, legal issues etc. – the course might equally well serve as an introduction to the digital preservation of images. It even includes some free software.

Well worth a look.

Read Full Post »

National Library of Australia

National Library of Australia

The National Library of Australia (NLA) began their web archiving project, PANDORA, in 1996, and the current team consists of four members of staff. The NLA’s web archiving programme is selective, contrasting with approaches in the Scandinavian countries in particular where the aim has been to harvest the entirety of the country’s web domain. The decision to make selective harvests only was resource driven, since there was no extra funding available, although the Library are now doing periodic .au domain harvests in conjunction with the Internet Archive. .au domain harvests have been commissioned since 2005, and the 4th harvest is due this year. It will run over 4 weeks initially, and capture an anticipated 1 billion files, comprising around 40 TB of data. The Internet Archive manage the harvest and carry out a full text index; the results will be shopped to NLA and maintained on NLA servers, although copies will also be available via the Wayback Machine (without the full text indexing).

The terminology ‘PANDORA Archive’ is acknowledged to cause some confusion, particularly within the Australian government, and the Library acknowledge that they are not in fact carrying out an archive role in the traditional sense. Rather, PANDORA is a web collection, a snapshot of a point in time, a representation of what the NLA feels is important in the Australian web domain. PANDORA doesn’t meet recordkeeping needs for recording business transactions; the websites are harvested purely for their content and there is some leeway in the accuracy of dates of collection – for example, a site will be timestamped when it is harvested, but the NLA then perform further quality controls on the harvested site which may take up to a week to complete.

That said, the websites are harvested with the intention that the NLA will attempt to keep them in perpetuity, and permission is sought from website publishers for collection, preservation and public access, with this in mind. Legal deposit legislation does not (yet) cover electronic information in Australia, and considerable effort is therefore required in obtaining the relevant permissions from the website publisher (but not from every contributor). Access restrictions are applied in certain circumstances – for instance, where there is a commercial interest involved. Access can be restricted in several different ways – (a) for a set period of time following archiving, (b) for specific dates, (c) by use of authenticated logins, (d) access restricted to one PC in the NLA’s reading room. One of the problems NLA identify with their current harvesting software is that the restriction mechanism is not sufficiently finely tuned to file level – currently access restrictions can only be specified on a whole website.

The selection guidelines used are under review at the moment. The current priorities include major events (for example, coverage of Australian elections) but can basically cover any original, high quality content not available in print. The websites harvested range from academic e-journals to blogs. PANDORA is just moving into Web 2.0 harvesting, although they have already captured many blogs, some MySpace pages and some online video.

A PANDORA ‘title’ might be anything from a single PDF document to a whole or part website. A particular website might also be harvested at scheduled intervals, how long between captures depending on how regularly the site is updated, whether content is periodically removed as well as new content added, and the general stability of the organisation publishing the website. The harvest interval is re-assessed at each harvest. Currently the most frequent periods to harvest are between 6 months and 1 year. Organisationally, it is more efficient to carry out captures less frequently.

The PANDORA archive currently holds around 2TB of data, consisting of around 20,000 titles and 40,000 harvested instances.

Of most interest vis-a-vis local archive services in the UK, PANDORA has nine partners in State Libraries and other cultural organisations, who can define what they require to be collected via a web browser interface to PANDORA’s in-house harvesting tool, PANDAS. Librarians in partner institutions can also log in to fix minor problems with harvests or log more significant issues for the team at NLA to resolve. Most of the actual capture work, however, is carried out by the team at NLA.

Whilst the PANDORA team has a library background, it is noted that a certain level of technical skills are required. That said, other than the quality control work carried out on each harvested title, little post-processing is currently carried out specifically to promote the longevity of the stored files. 3 copies are created – a preservation master (the original files as harvested), a display master (which includes any quality control changes), and a metadata master. A display copy is then generated from the display master.

Read Full Post »

Operating the Digital Archive

As previously posted, the operation of the PROV Digital Archive is well integrated into the wider organisation, with the same team responsible for transfers of both paper and digital records. This team also creates the disposal authorities (more commonly known as ‘retention schedules’ in the UK – is the different terminology significant??!) for all Agencies within the State of Victoria.

Digital records are only accepted into the Archive if they are VERS compliant, and the Agency’s recordkeeping system can produce VEOS according to the standard mandated under the Victorian (as in ‘State of…’) Public Records Act.

This is obviously a strong advantage for PROV, and not a requirement which can easily be translated into the UK local authority archives context. However it is worth noting that despite the relative strength of their archival legislation, PROV staff still commit considerable effort into consulting with Agencies and carrying out pilot transfers. The team at PROV have noticed that it is harder to encourage deposit in a digital world, whereas historically a lack of physical space for keeping records often triggered transfers to the archives. Whereas traditionally the transfer process was client driven, commencing with an Agency request, PROV are now trying to move towards a programmed transfer timetable for both paper and digital records. PROV are trying to sell this to the Agencies as being cheaper and easier than ad hoc clear-outs of records.

There are in any case many similarities in dealing with transfers of records to the archives whatever the format of the records. PROV needs to maintain intellectual control over the records series, and descriptive lists need to be produced. Background information on provenance and access arrangements or restrictions is gathered prior to transfer by PROV staff through site visits or, increasingly, formalised documentation. The Agency staff are responsible for producing a ‘manifest’ listing the records being transferred. PROV provides advice and training on the process of preparing digital records for transfer, and transfer guidelines are published on the PROV website. Digital archives may be transferred on CD, hard drive or copied remotely into the Digital Archive inbox (though few Agencies have yet taken advantage of this method of transfer, preferring to follow the paper paradigm and copy records onto CD much as they would package paper records into boxes).

The system of intellectual control (assigning of unique identifiers etc.) for digital archives follows much the same pattern as for paper records. My feeling is that Australian practice in the use of consignments and the series system makes this simpler to implement than with the UK practice using accession numbers and hierarchical cataloguing, although clearly we in the UK need to take some time, as did PROV with the revision of their Archival Control Model, to consider how to integrate digital archives into key archival processes.

Where do PROV themselves hope to see improvements? Dealing with digital has highlighted an internal need for improved written procedures for dealing with transfers, whether in paper or digital formats. New staff need to be trained to operate the Digital Archives interface (a heavily customised version of Documentum). Improved guidelines are also needed to help Agencies, and in particular Agency IT staff who are most likely not familiar with archival practices and terminology. One of the technical support staff at PROV pointed out that ‘file’ in IT terms has potentially a completely different meaning to the archival ‘file’. Language needs to be translated into terms which Agency staff are familiar.

Once the digital records arrive at PROV, the manifest is loaded into the Digital Archive system and checked against the records actually received. The records are checked to ensure that they are valid VEOs and that they are virus-free. Various errors can be picked up at this stage – duplicate records, extra records received or too few, problems with the digital signature etc. Simple errors can be fixed by PROV staff, but in general it has been found best to request the Agency to resubmit the whole transfer. The records remain in ‘quarantine’ for seven days, before the checking process is re-run. If successful, the transfer can be finalised and the records become viewable through the PROV online catalogue.

The first pilot transfers to the Digital Archive took place in 2005. The largest accession so far has in fact been digital surrogates from PROV’s own digitisation programme, although another major and ongoing project is the archives of the Melbourne 2006 Commonwealth Games. This has brought its own unique challenges in working with a project organisation in the process of being wound down (for example, password protected records which cannot be processed into VEOs have had to be ignored).

Read Full Post »

Older Posts »