Feeds:
Posts
Comments

Archive for the ‘Operational Digital Archives’ Category

It’s been a while since I’ve posted here purely on digital preservation issues: my work has moved in other directions, although I did attend a number of the digital preservation sessions at the Society of American Archivists’ conference this summer.  I retain a keen interest in digital preservation, however, particularly in developments which might be useful for smaller archives.  Recently, I’ve been engaged in a little work for a project called DiSARM (Digital Scenarios for Archives and Records Management), preparing some teaching materials for the Masters students at UCL to work from next term, and in revising the contents of a guest lecture I present to the University of Liverpool MARM students on ‘Digital Preservation for the Small Repository’.  Consequently, I’ve been trying to catch up on the last couple of years (since I left West Yorkshire Archive Service at the end of 2009) of new digital preservation projects and research.

So what’s new?  Well, from a small archives perspective, I think the key development has been the emergence of several digital curation workflow management systems – Archivematica, Curator’s Workbench, the National Archive of Australia’s Digital Preservation Software Platform (others…?) – which package together a number of different tools to guide the archivist through a sequenced set of stages for the processing of digital content.  The currently available systems vary in their approaches to preservation, comprehensiveness, and levels of maturity, but represent a major step forward from the situation just a couple of years ago.  In 2008, if (like me when WYAS took in the MLA Yorkshire archive as a testbed), you didn’t have much (or any) money available, your only option was – as one of the former Liverpool students memorably pointed out to me – to cobble together a set of tools as best you could from old socks and a bit of string.  Now we have several offerings approaching an integrated software solution; moreover, these packages are generally open source and freely available, so would-be adopters are able to download each one and play about with it before deciding which one might suit them best.

Having said that, I still think it is important that students (and practitioners, of course) understand the preservation strategies and assumptions underlying each software suite.  When we learn how to catalogue archives, we are not trained merely to use a particular software tool.  Rather, we are taught the principles of archival description, and then we move on to see how these concepts are implemented in practice in EAD or by using specific database applications, such as (in the U.K.) CALM or Adlib.  For DiSARM, students will design a workflow and attempt to process a small sample set of digital documents using their choice of one or more of the currently available preservation tools, which they will be expected to download and install themselves.  This Do-It-Yourself approach will mirror the practical reality in many small archives, where the (frequently lone) archivist often has little access to professional IT support. Similarly, students at UCL are not permitted to install software onto the university network.  Rather than see this as a barrier, again I prefer to treat this situation a reflection of organisational reality.  There are a number of very good reasons why you would not want to process digital archives directly onto your organisation’s internal network, and recycling re-purposing old computer equipment of varying technical specifications and capabilities to serve as workstations for ingest is a fact of life even, it seems, for Mellon-funded projects!

In preparation for writing this DiSARM task, I began to put together for my own reference a spreadsheet listing all the applications I could think of, or have heard referenced recently, which might be useful for preservation processing tasks in small archives.  I set out to record:

  • the version number of the latest (stable) release
  • the licence arrangements for each tool
  • the URL from which the software can be downloaded
  • basic system requirements (essentially the platform(s) on which the software can be run – we have surveyed the class and know there is a broad range of operating systems in use, including several flavours of both Linux and Windows, and Mac OS X)
  • location of further documentation for each application
  • end-user support availability (forums or mailing lists etc)
This all proved surprisingly difficult.  I was half expecting that user-friendly documentation and (especially) support might often be lacking in the smaller projects, but several websites also lack clear statements about system requirements or the legal conditions under which the software may be installed and used.  Does ‘educational use and research’ cover a local authority archives providing research services to the general public (including academics)?  Probably not, but it would presumably allow for use in a university archives.  Thanks to the wonders of interpreted programming languages (mostly Java, but Python also puts in an occasional appearance), many tools are effectively cross-platform, but it is astonishing how many projects fail clearly to say so.  This is self-evident to a developer, of course, but not at all obvious to an archivist, who will probably be worried about bringing coffee into the repository, let alone a reptile.  Oh, and if you expect your software to be compiled from code, or require sundry other faffing around at a command line before use, I’m sorry, but your application is not “easy to implement” for ordinary mortals, as more than one site claimed.  Is it really so hard to generate binary executables for common operating systems (or if you have a good excuse – such as Archivematica which is still in alpha development – at least provide detailed step-by-step instructions)?  Many projects of course make use of SourceForge to host code, but use another website for documentation and updates – it can be quite confusing finding your way around.  The veritable ClamAV seems to have undergone some kind of Windows conversion, and although I’m sure that Unix packages must be there somewhere, I’m damned if I could find them easily…

All of which plays into a wider debate about just how far the modern archivist’s digital skills ought to reach (there are many other versions of this debate, the one linked – from 2006 so now quite old – just happens to be one of the most comprehensive attempts to define a required digital skill set for information practitioners).  No doubt there will be readers of this post who believe that archivists shouldn’t be dabbling in this sort of stuff at all, especially if s/he also works for an organisation which lacks the resources to establish a reliable infrastructure for a trusted digital repository.  And certainly I’ve been wondering lately whether some kind of archivists’ equivalent of The Programming Historian would be welcome or useful, teaching basic coding tailored to common tasks that an archivist might need to carry out.  But essentially, I don’t subscribe to the view that all archivists need to re-train as computer scientists or IT professionals.  Of course, these skills are still needed (obviously!) within the digital preservation community, but to drive a car I don’t need to be a mechanic or have a deep understanding of transport infrastructure.  Digital preservation needs to open up spaces around the periphery of the community where newcomers can experiment and learn, otherwise it will become an increasingly closed and ultimately moribund endeavour.
Advertisements

Read Full Post »

8am on Saturday morning, and those hardy souls who have not yet fled to beat Hurricane Irene home or who are stranded in Chicago, plus other assorted insomniacs, were presented with a veritable smörgåsbord of digital preservation goodness.  The programme has many of the digital sessions scheduled at the same time, and today I decided not to session-hop but stick it out in one session in each of the morning’s two hour-long slots.

My first choice was session 502, Born-Digital archives in Collecting Repositories: Turning Challenges into Byte-Size Opportunities, primarily an end-of-project report on the AIMS Project.  It’s been great to see many such practical digital preservation sessions at this conference, although I do slightly wonder what it will take before working with born-digital truly becomes part of the professional mainstream.  Despite the efforts of all the speakers at sessions like this (and in the UK, colleagues at the Digital Preservation Roadshows with which I was involved, and more recent similar events), there still appears to be a significant mental barrier which stops many archivists from giving it a go.  As the session chair began her opening remarks this morning, a woman behind me remarked “I’m lost already”.

There may be some clues in the content of this morning’s presentations: in amongst my other work (as would be the case for most archivists, I guess) I try to keep reasonably up-to-date with recent developments in practical digital preservation.  For instance, I was already well aware of the AIMS Project, although I’d not had a previous opportunity to hear about their work in any detail, but here were yet more new suggested tools for digital preservation: I happen to know of FTK Imager, having used it with the MLA Yorkshire archive accession, although what wasn’t stated was that the full FTK forensics package is damn expensive and the free FTK Imager Lite (scroll down the page for links) is an adequate and more realistic proposition for many cash-strapped archives.  BagIt is familiar too, but Bagger, a graphical user interface to the BagIt Library is new since I last looked (I’ll add links later – the Library of Congress site is down for maintenance”).  Sleuthkit was mentioned at the research forum earlier this week, but fiwalk (“a program that processes a disk image using the SleuthKit library and outputs its results in Digital Forensics XML”) was another new one on me, and there was even talk in this session of hardware write-blockers.  All this variety is hugely confusing for anybody who has to fit digital preservation around another day job, not to mention potentially expensive when it comes to buying hardware and software, and the skills necessary to install and maintain such a jigsaw puzzle system.  As the project team outlined their wish list for yet another application, Hypathia, I couldn’t help wondering whether we can’t promote a little more convergence between all these different tools both digital preservation specific and more general.  For instance, the requirement for a graphical drag ‘n’ drop interface to help archivists create the intellectual arrangement of a digital collection and add metadata reminded me very much of recent work at Simmons College on a graphical tool to help teach archival arrangement and description (whose name I forget, but will add it when it comes back to me!*).  I was interested particularly in the ‘access’ part of this session, particularly the idea that FTK’s bookmark and label functions could be transformed into user generated content tools, to enable researchers to annotate and tag records, and in the use of network graphs as a visual finding aid for email collections.

The rabbit-caught-in-headlights issue seems less of an issue for archivists jumping on the Web2.0 bandwagon, which was the theme of session 605, Acquiring Organizational Records in a Social Media World: Documentation Strategies in the Facebook Era, where we heard about the use of social media, primarily facebook, to contact and document student activities and student societies in a number of university settings, and from a university archivist just beginning to dip her toe into Twitter.  As a strategy of working directly with student organisations and providing training to ‘student archivists’ was outlined, as a method of enabling the capturing of social media content (both simultaneously with upload and by web-crawling sites afterwards), I was reminded of my own presentation at this conference: surely here is another example of real-life community development? The archivist is deliberately ‘going out to where the community is’ and adapting to the community norms and schedules of the students themselves, rather than expecting the students themselves to comply with archival rules and expectations.

This afternoon I went to learn about SNAC: the social networks and archival context project (session 710), something I’ve been hearing other people mention for a long time now but knew little about.  SNAC is extracting names (corporate, personal, family) from Encoded Archival Description (EAD) finding aids as EAC-CPF and then matching these together and with pre-existing authority records to create a single archival authorities prototype.  The hope is to then extend this authorities cooperative both nationally and potentially internationally.

My sincere thanks to the Society of American Archivists for their hospitality during the conference, and once again to those who generously funded my trip – the Archives and Records Association, University College London Graduate Conference Fund, UCL Faculty of Arts and UCL Department of Information Studies.

* UPDATE: the name of the Simmons’ archival arrangement platform is Archivopteryx (not to be confused with the Internet mail server Archiveopteryx which has an additional ‘e’ in the name)

Read Full Post »

Friday had a bit of a digital theme for me, beginning with a packed, standing-room-only session 302, Practical Approaches to Born-Digital Records: What Works Today. After a witty introduction by Chris Prom about his Fulbright research in Dundee, a series of speakers introduced their digital preservation work, with a real emphasis on ‘you too can do this’.  I learnt about a few new tools: firefly, a tool which is used to scan for American social security numbers and other sensitive information – not much use in a British context, I imagine, but an interesting approach all the same; TreeSize Professional, a graphical hard disk analyser; and several projects were making use of the Duke Data Accessioner, a tool with which I was already familiar but have never used.  During the morning session, I also popped in and out of ‘team-Brit’ session 304 Archives in the Web of Data which discussed developments in the UK and US in opening up and linking together archival descriptive data, and session 301 Archives on the Go: Using Mobile Technologies for Your Collections, where I caught a presentation on the use of FourSquare at Stanford University.

In the afternoon, I mostly concentrated on session 401, Re-arranging Arrangement and Description, with a brief foray into session 407, Faces of Diversity: Diasporic Archives and Archivists in the New Millennium.  Unless I missed this whilst I was out at the other session, nobody in session 410 mentioned the series system as a possible alternative or resolution to some of the tensions identified in a strict application of hierarchically-interpreted original order, which surprised me.  There were some hints towards a need for a more object-oriented view of description in a digital environment, and of methods of addressing the complexity of having multiple representations (physical, digital etc.), but I have been reading my UCL colleague Jenny Bunn’s recently completed PhD thesis, Multiple Narratives, Multiple Views: Observing Archival Description on flights for this trip, which would have added another layer to the discussion in this session.

And continuing the digital theme, I was handed a flyer for an event coming later this year (on 6th October): Day of Digital Archives which might interest some UK colleagues.  This is

…an initiative to raise awareness of digital archives among both users and managers. On this day, archivists, digital humanists, programmers, or anyone else creating, using, or managing digital archives are asked to devote some of their social media output (i.e. tweets, blog posts, youtube videos etc.) to describing their work with digital archives.  By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and maanged? Why are they important?

 

Read Full Post »

This should be the first of several posts from this year’s Society of American Archivists Annual Meeting in Chicago, for which I have received generous funding to attend from UCL’s Graduate Conference Fund, and from the Archives and Records Association who asked me to blog the conference.  First impressions of a Brit: this conference is huge.  I could (and probably will) get lost inside the conference hotel, and the main programme involves parallel tracks of ten sessions at once.  And proceedings start at 8am.  This is all a bit of a shock to the system; not sure anybody would turn up if you started before 9am at the earliest back home! Anyway, the twitter tag to watch is #saa11, although with no wifi in the session rooms, live coverage of sessions will be limited to those who can get a mobile phone signal, which is a bit of a shame.

The conference proper starts on Thursday; the beginning of the week is mostly taken up with meetings, but on Tuesday I attended an impressive range of presentations at the SAA Research Forum.  Abstracts and bios for each speaker are already online (and are linked where relevant below), and I understand that slides will follow in the next week or so.  Here are some personal highlights and things which I think may be of interest to archivists back home in the UK:

It was interesting to see several presentations on digital preservation, many reflecting similar issues and themes to those which inspired my Churchill Fellowship research and the beginning of this blog back in 2008.  Whilst I don’t think I’d recommend anyone set out to learn about digital preservation techniques the hard way with seriously obsolete media, if you do find yourself in the position of having to deal with 5.25 inch floppy disks or the like, Karen Ballingher’s presentation on students’ work at the University of Texas – Austin had some handy links, including the UT-iSchool Digital Archaeology Lab Manual and related documentation and an open source forensics package called Sleuth Kit.  Her conclusions were more generally applicable, and familiar: the importance of documenting everything you do, including failures; planning out trials; and just do it – learn by doing a real digital preservation project.  Cal Lee was excellent (as ever) on Levels of Representation in Digital Collections, outlining a framework of digital information constructed of eight layers of representation from the bit(byte-)stream to aggregations of digital objects, and noting that archival description already supports description at multiple levels but has not yet evolved to address these multiple representation layers.  Eugenia Kim’s paper on her ChoreoSave project to determine the metadata elements required for digital dance preservation reminded me of several UK and European initiatives; Siobhan Davies Replay, which Eugenia herself referenced and talked about at some length; the University of the Arts London’s John Latham Archive, which I’ve blogged about previously, because Eugenia commented that choreographers had found the task of entering data into the numerous metadata fields onerous: once again it seems to me there is a tension between the (dance, in this case) event and the assumption that text offers the only or best means of describing and accessing that event; and the CASPAR research on the preservation of interactive multimedia performances at the University of Leeds.

For my current research work on user participation in archives, the following papers were particularly relevant: Helice Koffler‘s report on the RLG Social Metadata Working Group‘s project on evaluating the impact of social media on museums, libraries and archives.  A three-part report is to be issued; part one is due for publication in September 2011.  I understand that this will include some useful and much-needed definitions of ‘user interaction’ terminology.  Part 1 has moderation as its theme – Helice commented that a strict moderation policy can act as a barrier to participation (a point that I agree with up to a point – and will explore further in my own paper on Thursday).  Part 2 will be an analysis of the survey of social media use undertaken by the Working Group (4 U.K. organisations were involved in this, although none were archives).  As my interviews with archivists would also suggest, the survey found little evidence of serious problems with spam or abusive behaviour on MLA contributory platforms.  Ixchel Faniel reported on University of Michigan research on whether trust matters for re-use decisions.

With my UKAD hat on, the blue sky (sorry, I hate that term, but I think its appropriate in this instance) thinking on archival description methods which emerged from the Radcliffe Workshop on Technology and Archival Processing was particularly inspiring.  The workshop was a two-day event which brought together invited technologists (many of whom had not previously encountered archives at all) and archivists to brainstorm new thinking on ways to tackle cataloguing backlogs, streamline cataloguing workflows and improve access to archives.  A collections exhibition was used to spark discussion, together with specially written use cases and scenarios to guide each day’s discussion.  Suggestions included the use of foot-pedal operated overhead cameras to enable archival material to be digitised either at the point of accessioning, or during arrangement and description; experimenting with ‘trusted crowdsourcing’ – asking archivists to check documents for sensitivity – as a first step towards automating the redaction process of confidential information.  These last two suggestions reminded me of two recent projects at The National Archives in the U.K. – John Sheridan’s work to promote expert input into legislation.gov.uk (does anyone have a better link?) and the proposal to use text mining on closed record series which was presented to DSG in 2009.  Adam Kreisberg presented about the development of a toolkit for running focus groups by the Archival Metrics Project.  The toolkit will be tested with a sample session based upon archives’ use of social media, which I think could be very valuable for U.K. archivists.

Finally only because I couldn’t fit this one into any of the categories above, I found Heather Soyka and Eliot Wilczek‘s questions on how modern counter-insurgency warfare can be documented intriguing and thought-provoking.

Read Full Post »

In conversation with the very excellent RunCoCo project at Oxford University last Friday, I revisited a question which will, I think, prove central to my current research – establishing trust in an online archival environment.  This is an important issue both for community archives, such as Oxford’s Great War Archive, as well as for conventional Archive Services which are taking steps to open up their data to user input in some way – whether this be (for example) by enabling user comments on the catalogue, or establishing a wiki, or perhaps making digitised images available on flickr.

A simple, practical scenario to surface some of the issues:

An image posted to flickr with minimal description.  Two flickr users, one clearly a member of staff at the Archives concerned, have posted suggested identifications.  Since they both in fact offer the same name (“Britannia Mill”), it is not immediately clear whether they both refer to the same location, or whether the second comment contradicts the first.

Which comment (if either) correctly identifies the image?  Would you be inclined to trust an identification from a member of staff more readily than you’d accept “Arkwright”‘s comment?  If so, why? Clicking on “Arkwright”‘s profile, we learn that he is a pensioner who lives locally.  Does this alter your view of the relative trustworthiness of the two comments (for all we know, the member of staff might have moved into the area just last week)? How could you test the veracity of the comments?  Whose responsibility is this? If you feel it’s the responsibility of the Archive Service in question, what resources might be available for this work? If you worked for the Archive Service, would you feel happy to incorporate information derived from these comments into the organisation’s finding aids?  Bear in mind that any would-be user searching for images of “Britannia Mills” – wherever the location – would not find this image using the organisation’s standard online catalogue: is potentially unreliable information better than no information at all? What would you consider an ‘acceptable’ quality and/or quantity level for catalogue metadata for public presentation? You might think this photograph should never have been uploaded to flickr in its current state – but even this meagre level of description has been sufficient to start an interesting – potentially useful? – discussion.  Just as a relatively poor quality scan has been ‘good enough’ to enable public access outside of the repository, although it would certainly not suffice for print publication, for example.

Such ambivalence and uncertainty about accepting user contributions is one reason that The National Archives wiki Your Archives was initially designed “to be ‘complementary’ to the organisation’s existing material” rather than fully integrated into TNA’s website.

In our discussion on Friday, we identified four ways in which online archives might try to establish trust in user contributions:

  • User Profiles: enabling users to provide background information on their expertise.  The Polar Bear Expedition Archives at the University of Michigan have experimented with this approach for registered users of the site, with apparently ambiguous results.  Similar features are available on the Your Archives wiki, although similarly, few users appear to use them, except for staff of TNA.  Surfacing the organisational allegiance of staff is of course important, but would not inherently make their comments more trustworthy (as discussed above), unless more in-depth information about their qualifications and areas of expert knowledge is also provided.  A related debate about whether or not to allow anonymous comments, and the reliability of online anonymous contributions, extends well beyond the archival domain.
  • Shifting the burden of proof to the user: offering to make corrections to organisational finding aids upon receipt of appropriate documentation.  This is another technique pioneered on the Polar Bear Expedition Archives site, but might become burdonsome given a particularly active user community.
  • Providing user statistics and/or manipulating the presentation of user contributions on the basis of user statistics: i.e. giving more weight to contributions from users whose previous comments have proved to be reliable.  Such techniques are used on Wikipedia (users can earn enhanced editing rights by gaining the trust of other editors), and user information is available from Your Archives, although somewhat cumbersome to extract – in its current form, I think it is unlikely anybody would use this information to form reliability judgements.  This technique is sometimes also combined with…
  • Rating systems: these can be either organisation-defined ratings (as, for instance, the Brooklyn Museum Collection Online – I do not know of an archives example) or user-defined (the familiar Amazon or e-Bay ranking system -but, again, I can’t think of an instance where such a system has been implemented in an archives context, although often talked about – can you?). Flickr implements a similar principle, whereby registered users can ‘favourite’ images.

A quick scan of Google Scholar reveals much research into establishing trust in the online marketplace, and of trust-building in the digital environment as a customer relationship management activity.  But are these commercial models necessarily directly applicable to information exchange in the archives environment, where the issue at stake is not so much the customer’s trust in the organisation or project concerned (although this clearly has an impact on other forms of trust) so much as the veracity and reliability of the historical information presented?

Do you have any other suggestions for techniques which could be (or are) used to establish trust in online archives, or further good examples of the four techniques outlined in archival practice?  It strikes me that all four options above rely heavily upon human interpretation and judgement calls, therefore scalability will become an issue with very large datasets (particularly those held outside of an organisational website) which the Archives may want to manipulate machine-to-machine (see this recent blog post and comments from the Brooklyn Museum).

Read Full Post »

Chris Prom‘s talk on his Fulbright research ‘Tools for implementing Digital Preservation Standards’ for the ‘under resourced’ archive at the Society of Archivists’ Data Standards Group meeting (presentation slides should be available here shortly) yesterday has finally spurred me into posting a roundup of projects which I’ve encountered over the last couple of months, which are specifically relevant to digital preservation in a small archives repository.

When I embarked upon my Churchill Fellowship in 2008, practical implementations of digital preservation research were only occurring in large repositories, usually at a national or sometimes state level.  With the notable exception of the Paradigm project and related work at Oxford University, there had been few attempts to scale down the large programmes, or to package up the various tools available with the products of digital library/repository world, as envisaged by the 2007 UNESCO report Towards an Open Source Archival Repository and Preservation System.  The smaller programmes I did visit were generally concentrating on a niche subset of digital archives (for example, email or web archives).

Dedicated followers of digital preservation issues are probably already aware of the RODA repository created on a Fedora base by the Portuguese National Archives, and may have read this review of the demo site from another UK local archivist.  Chris Prom is now embarking on a more formal assessment, and his blog postings on RODA (and the evaluation criteria he is using) make for worthwhile reading.  RODA is likely to be of particularly interest to UK-based archivists who use the collections management software package, CALM, since this is also in use at the Portuguese National Archives, although there doesn’t seem to have been any attempt to date to link the two together.  What happens with a hybrid accession? is the obvious question.

Chris also introduced yesterday’s audience to a new project, Archivematica, which is packaging already available open source preservation tools into a Linux Ubuntu-based virtual appliance.  As the project’s wiki explains, ‘This means an entire suite of digital preservation tools is now available to the average archivist from one simple installation’.  This is a really exciting development and I am looking forward to seeing the results of Chris’s evaluation.  Archivematica is developed by the same Canadian team, Artefactual Systems, who are behind the ICA-Atom archival description software commissioned by the International Council on Archives.

Closer to home, since I am involved on the board for one of the projects, it is remiss of me not to have mentioned before on this blog the digital curation work going on at Gloucestershire Archives, although the website itself has only been made available relatively recently.  This work is the first real attempt to develop a practical digital curation architecture in a UK local authority archives setting (as opposed to simple re-use of existing tools, piecemeal).  Plenty to explore here.

And finally, on a less technical level, but nevertheless, I think, an important development.  At the sixth of the Society of Archivists’ roadshows in December 2009, I was delighted to hear of Kevin Bolton‘s work in drawing up simple accessioning checklists for digital archives at Manchester Archives and Local Studies, and – most importantly – how these are being developed regionally for the North West, in conjunction with Cheshire Archives and Local Studies.  Particularly at this time of economic recession (or are we supposed to be out of that now?) I believe it is vital that smaller archives pool their resources and work in partnership to find solutions to digital archives issues, and it is good to see a framework for the future being mapped out here in the North West.

Read Full Post »

Just a quick place-marking type post to point people towards the presentation slides from today’s Edinburgh Digital Preservation Roadshow, particularly those from Jane Brown on the National Archive of Scotland’s Digital Data Archive.  NAS has written an in-house workflow tool for ingest in .NET, and, interestingly, are proposing to follow the Australians in an up-front normalisation to open-source strategy, rather than the migration model favoured by The National Archives in London. Local archivists may be particularly interested because NAS are users of the Calm collections management system, which is in widespread use across the sector in the UK. I wondered during the presentation why I’d not encountered the Digital Data Archive before, but apparently this was its first public outing.  It looks an impressive achievement, considering that they estimate the total staff resource on the project to approximate to only one full time equivalent over the last four to five years.  I am due to visit the NAS soon, so hopefully more to follow…

Read Full Post »

Older Posts »