Feeds:
Posts
Comments

Posts Tagged ‘small archives’

It’s been a while since I’ve posted here purely on digital preservation issues: my work has moved in other directions, although I did attend a number of the digital preservation sessions at the Society of American Archivists’ conference this summer.  I retain a keen interest in digital preservation, however, particularly in developments which might be useful for smaller archives.  Recently, I’ve been engaged in a little work for a project called DiSARM (Digital Scenarios for Archives and Records Management), preparing some teaching materials for the Masters students at UCL to work from next term, and in revising the contents of a guest lecture I present to the University of Liverpool MARM students on ‘Digital Preservation for the Small Repository’.  Consequently, I’ve been trying to catch up on the last couple of years (since I left West Yorkshire Archive Service at the end of 2009) of new digital preservation projects and research.

So what’s new?  Well, from a small archives perspective, I think the key development has been the emergence of several digital curation workflow management systems – Archivematica, Curator’s Workbench, the National Archive of Australia’s Digital Preservation Software Platform (others…?) – which package together a number of different tools to guide the archivist through a sequenced set of stages for the processing of digital content.  The currently available systems vary in their approaches to preservation, comprehensiveness, and levels of maturity, but represent a major step forward from the situation just a couple of years ago.  In 2008, if (like me when WYAS took in the MLA Yorkshire archive as a testbed), you didn’t have much (or any) money available, your only option was – as one of the former Liverpool students memorably pointed out to me – to cobble together a set of tools as best you could from old socks and a bit of string.  Now we have several offerings approaching an integrated software solution; moreover, these packages are generally open source and freely available, so would-be adopters are able to download each one and play about with it before deciding which one might suit them best.

Having said that, I still think it is important that students (and practitioners, of course) understand the preservation strategies and assumptions underlying each software suite.  When we learn how to catalogue archives, we are not trained merely to use a particular software tool.  Rather, we are taught the principles of archival description, and then we move on to see how these concepts are implemented in practice in EAD or by using specific database applications, such as (in the U.K.) CALM or Adlib.  For DiSARM, students will design a workflow and attempt to process a small sample set of digital documents using their choice of one or more of the currently available preservation tools, which they will be expected to download and install themselves.  This Do-It-Yourself approach will mirror the practical reality in many small archives, where the (frequently lone) archivist often has little access to professional IT support. Similarly, students at UCL are not permitted to install software onto the university network.  Rather than see this as a barrier, again I prefer to treat this situation a reflection of organisational reality.  There are a number of very good reasons why you would not want to process digital archives directly onto your organisation’s internal network, and recycling re-purposing old computer equipment of varying technical specifications and capabilities to serve as workstations for ingest is a fact of life even, it seems, for Mellon-funded projects!

In preparation for writing this DiSARM task, I began to put together for my own reference a spreadsheet listing all the applications I could think of, or have heard referenced recently, which might be useful for preservation processing tasks in small archives.  I set out to record:

  • the version number of the latest (stable) release
  • the licence arrangements for each tool
  • the URL from which the software can be downloaded
  • basic system requirements (essentially the platform(s) on which the software can be run – we have surveyed the class and know there is a broad range of operating systems in use, including several flavours of both Linux and Windows, and Mac OS X)
  • location of further documentation for each application
  • end-user support availability (forums or mailing lists etc)
This all proved surprisingly difficult.  I was half expecting that user-friendly documentation and (especially) support might often be lacking in the smaller projects, but several websites also lack clear statements about system requirements or the legal conditions under which the software may be installed and used.  Does ‘educational use and research’ cover a local authority archives providing research services to the general public (including academics)?  Probably not, but it would presumably allow for use in a university archives.  Thanks to the wonders of interpreted programming languages (mostly Java, but Python also puts in an occasional appearance), many tools are effectively cross-platform, but it is astonishing how many projects fail clearly to say so.  This is self-evident to a developer, of course, but not at all obvious to an archivist, who will probably be worried about bringing coffee into the repository, let alone a reptile.  Oh, and if you expect your software to be compiled from code, or require sundry other faffing around at a command line before use, I’m sorry, but your application is not “easy to implement” for ordinary mortals, as more than one site claimed.  Is it really so hard to generate binary executables for common operating systems (or if you have a good excuse – such as Archivematica which is still in alpha development – at least provide detailed step-by-step instructions)?  Many projects of course make use of SourceForge to host code, but use another website for documentation and updates – it can be quite confusing finding your way around.  The veritable ClamAV seems to have undergone some kind of Windows conversion, and although I’m sure that Unix packages must be there somewhere, I’m damned if I could find them easily…

All of which plays into a wider debate about just how far the modern archivist’s digital skills ought to reach (there are many other versions of this debate, the one linked – from 2006 so now quite old – just happens to be one of the most comprehensive attempts to define a required digital skill set for information practitioners).  No doubt there will be readers of this post who believe that archivists shouldn’t be dabbling in this sort of stuff at all, especially if s/he also works for an organisation which lacks the resources to establish a reliable infrastructure for a trusted digital repository.  And certainly I’ve been wondering lately whether some kind of archivists’ equivalent of The Programming Historian would be welcome or useful, teaching basic coding tailored to common tasks that an archivist might need to carry out.  But essentially, I don’t subscribe to the view that all archivists need to re-train as computer scientists or IT professionals.  Of course, these skills are still needed (obviously!) within the digital preservation community, but to drive a car I don’t need to be a mechanic or have a deep understanding of transport infrastructure.  Digital preservation needs to open up spaces around the periphery of the community where newcomers can experiment and learn, otherwise it will become an increasingly closed and ultimately moribund endeavour.

Read Full Post »

A round-up of a few pieces of digital goodness to cheer up a damp and dark start to October:

What looks like a bumper new issue of the Journal of the Society of Archivists (shouldn’t it be getting a new name?) is published today.  It has an oral history theme, but actually it was the two articles that don’t fit the theme which caught my eye for this blog.  Firstly, Viv Cothey’s final report on the Digital Curation project, GAip and SCAT, at Gloucestershire Archives, with which I had a minor involvement as part of the steering group for the Sociey of Archivists’-funded part of the work.  The demonstration software developed by the project is now available for download via the project website.  Secondly, Candida Fenton’s dissertation research on the Use of Controlled Vocabulary and Thesauri in UK Online Finding Aids will be of  interest to my colleages in the UKAD network.  The issue also carries a review, by Alan Bell, of Philip Bantin’s book Understanding Data and Information Systems for Recordkeeping, which I’ve also found a helpful way in to some of the more technical electronic records issues.  If you do not have access via the authentication delights of Shibboleth, no doubt the paper copies will be plopping through ARA members’ letterboxes shortly.

Last night, by way of supporting the UCL home team (read: total failure to achieve self-imposed writing targets), I had my first go at transcribing a page of Jeremy Bentham’s scrawled notes on Transcribe Bentham.  I found it surprisingly difficult, even on the ‘easy’ pages!  Admittedly, my paleographical skills are probably a bit rusty, and Bentham’s handwriting and neatness leave a little to be desired – he seems to have been a man in a hurry – but what I found most tricky was not being able to glance at the page as a whole and get the gist of the sentence ahead at the same time as attempting to decipher particular words.  In particular, not being able to search down the whole page looking for similar letter shapes.  The navigation tools do allow you to pan and scroll, and zoom in and out, but when you’ve got the editing page up on the screen as well as the document, you’re a bit squished for space.  Perhaps it would be easier if I had a larger monitor.  Anyway, it struck me that this type of transcription task is definitely a challenge, for people who want to get their teeth into something, not the type of thing you might dip in and out of in a spare moment (like indicommons on iPhone and iPad, for instance).

I’m interested in reward and recognition systems at the moment, and how crowdsourcing projects seek to motivate participants to contribute.  Actually, it’s surprising how many projects seem not to think about this at all – the build it and wait for them to come attitude.  Quite often, it seems, the result is that ‘they’ don’t come, so it’s interesting to see Transcribe Bentham experiment with a number of tricks for monitoring progress and encouraging people to keep on transcribing.  So, there’s the Benthamometer for checking on overall progress, you can set up a watchlist to keep an eye on pages you’ve contributed to, individual registered contributors can set up a user profile to state their credentials, chat to fellow transcribers on the discussion forum, and there’s a points system, depending on how active you are on the site, and a leader board of top transcribers.  The leader board seems to be fueling a bit of healthy transatlantic competition right at the moment, but given the ‘expert’ wanting-to-crack-a-puzzle nature of the task here, I wonder whether the more social / community-building facilities might prove more effective over the longer term than the quantitative approaches.  One to watch.

Finally, anyone with the techie skills to mashup data ought to be welcoming The National Archives’ work on designing the Open Government Licence (OGL) for public sector information in the U.K.  I haven’t (got the technical skills) but I’m welcoming it anyway in case anyone who has hasn’t yet seen the publicity about it, and because I am keen to be associated with angels.

Read Full Post »

Under the avuncular eye of fellow Pembrokian William Pitt the Younger, I was presented with my Churchill Fellowship Medallion by Her Royal Highness the Duchess of Cornwall at the City of London Guildhall on Friday 21st May.  Unfortunately, I can’t blog the picture of me receiving my medallion; partly because its locked down by some horrible DRM system, partly because it looks as if my head has been stuck on at the wrong angle.  I also couldn’t find a decent picture of Mr Pitt’s Guildhall monument (slightly naff, it has to be said – with Britannia riding a sea-horse – apparently the design was chosen for its cheapness rather than its artistic merit).  So here instead is a picture of the much nicer Pitt statue at Pembroke, although I have often worried that a toga is really not the best costume for sitting outside on a cold Cambridge day.  No wonder his toes are blue:

;

Pitt the Younger, Pembroke College, Cambridge. Photo by James UK on flickr

I was amused by the text of the inscription¹ at the Guildhall:

HE REPAIRED THE EXHAUSTED REVENUES, HE REVIVED AND INVIGORATED
THE COMMERCE AND PROSPERITY OF THE COUNTRY;
AND HE HAD RE-ESTABLISHED THE PUBLICK CREDIT ON DEEP AND SURE FOUNDATIONS;

Sounds like he’d be a handy chap to have as Prime Minister right now really, although I’m less sure about this part (just about pulls it back in the last line):

HIS INDUSTRY WAS NOT RELAXED BY CONFIDENCE IN HIS GREAT ABILITIES;
HIS INDULGENCE TO OTHERS WAS NOT ABATED BY THE CONSCIOUSNESS 
OF HIS OWN SUPERIORITY;
HIS AMBITION WAS PURE FROM ALL SELFISH MOTIVES;

Joking aside, it was a suitably grand occasion to celebrate the incredible variety of all the recent Churchill Fellowships.  After the award ceremony, 2009 Fellow Michael Kernan sought me out.  Michael is the Honorary Historian and Archivist at the Fire Service College in Gloucestershire, and wanted advice on digital preservation with regard to the Fire Service College’s collection – both for digitised archive documents and born-digital oral histories of firemen’s exeriences of the Blitz.  So further proof, if proof were needed, of the ongoing relevance of the central tenet of my Fellowship – that we need to develop digital preservation solutions which scale down to the local level, as well as scale up to the (inter-)national.

I was able to point Michael towards the work in both digitisation and digital preservation taking place locally to him at Gloucestershire Archives.  This would not have been possible when I first put my Churchill Fellowship application together back in 2007.  Last week I also heard from a colleague at Staffordshire and Stoke-on-Trent Archives, where similarly they are now taking some real, practical steps towards addressing digital preservation at a local level.  I would like to think that my Churchill Fellowship has played a small part in encouraging local archivist colleagues in the UK and giving them the confidence to take up the digital archives challenge.

Coincidentally, as I was picking up my Churchill medallion at the Guildhall, Viv Cothey, the developer at Gloucestershire Archives, was speaking at the seminar, ‘Practical Approaches to Electronic Records: the Academy and Beyond‘, organised by Chris Prom and held at the University of Dundee.  I was very sorry indeed to have to miss this event, but fortunately it has been covered in the blogosphere by Sue Donnelly of the LSE Archives and Simon Wilson from the University of Hull, representing another new digital preservation project, AIMS – Born Digital Collections: An Inter-Institutional Model for Stewardship.  Chris Prom will shortly be returning to Illinois at the end of his Fulbright scholarship.  I am sure that the following sentiments were expressed copiously on the day at Dundee, but I would also like to add my own personal vote of thanks to Chris for the huge contribution his project has made over the last year in discovering, developing and disseminating practical digital preservation methods and tools for ‘real’ archivists.  Safe journey home!

Edit: to add a link to Peter Cliff’s presentation from the Dundee seminar on Developing and Implementing Tools to Manage Hybrid Archives (slideshare).

¹ Copyright, apparently, George Canning – why do these people follow me about?

Read Full Post »

On 27th March (yes, I know, Easter got in the way) I attended the Rewired Culture unconference at The Guardian in London.  I’d not been to an unconference before, let alone one associated with a hackday, but I’d followed similar intiatives, such as the THATCamp series at a distance via twitter and blog postings.  So I was intrigued – if a little nervous – to find out from the inside how such an event worked. [Coincidentally, there has been the most extraordinary flame today on the UK Records Management listserv about the concept of an unconference, which is obviously unfamiliar (excuse pun) to many records professionals in the UK.  I hope this blog post goes a little way towards demonstrating the potential value of this type of event to the archives and records sector.]

The day’s events were organised jointly by DCMS and Rewired State, a not-for-profit company whose mission is neatly summed up in their tagline ‘geeks meet government’.  Rewired Culture, which also masqueraded under the twitter hashtag #rsrc, aimed to bring together cultural ‘data owners’ (such as Museums, Libraries and Archives) with Britain’s “vibrant developer community” and “growing and active entrepreneurial base”.  The half day unconference strand (which was free, incidentally – thank you) offered an opportunity to discuss how cultural creators (ie record creators in an archive context), curators (read archivists), developers (IT professionals) and entrepreneurs can collaborate to exploit the potential of cultural content and promote innovation in a participatory web2.0 world:

How do we ensure that the exciting work already underway in a number of organizations is shared more generally, so even smaller bodies and SMEs can learn from best practice and find workable routes to market? What are the cultural content business models for the 21st century? …for data owners, entrepreneurs, data users and communites to discuss business models, funding mechanisms and challenges.

Encouraged by the promise that at an unconference, “everybody’s voice is as valid as everyone else’s”, I went along nevertheless expecting to be the only archivist in a room full of people from the big national museums.  I was pleasantly surprised, therefore, to find that fellow participants included a bunch of colleagues from The National Archives, as well as a number of other people who for a variety of reasons had an interest in smaller cultural organisations.

My own attendance was also prompted by a somewhat vaguely thought-through idea that techie/geek mashups making use of cultural content could be viewed as one extreme of a user-collaboration continuum (disclaimer: these are very much thoughts-in-progress, and need a lot more mashing!):

During Rewired Culture, I was pointed towards the work of one of the current Clore Fellows, Claire Antrobus, who is researching user-led innovation in art galleries.  There are some interesting parallels and contrasts with the archives domain here, and I like the ‘user-led innovation’ concept.

Each unconference session lasted for an hour (possibly a little too long – at times I felt the discussions would benefit from more focus, but this perhaps depends on the participants in each group and anyway, you are at liberty to ‘vote with your feet’ and join another session if you wish, something which is not usually possible in a formal conference setting).  The first session I attended discussed institutional barriers to opening up cultural data.  Some familiar themes emerged, including language barriers between ‘techies’ and ‘curators’, business drivers for engaging in new, potentially risky, areas of work at a time of significant budget cuts in the public-sector, and identifying external funding streams for technological innovation (I wondered specifically whether the regional structure of the principal archives-sector grant funder, the Heritage Lottery Fund, and the emphasis they place upon localised community outcomes for projects they support, inhibits innovation in the re-use of archival content on the internet, which is by definition global in its reach).  The session also surfaced what I felt was a misunderstanding of the positivist, Jenkinsonian theory of the archivist as passive custodian (as opposed to active interpreter) of archival content, which one museum professional present had taken as a particular reluctance amongst archivists to open up archival data.  My former employer, West Yorkshire Archive Service, has had its full electronic catalogue freely available on the internet for over ten years, which is more than can be said, even now, of many local museum services.  Admittedly there is plenty of work still to be done in making this catalogue data available in re-usable, developer-friendly formats, and there is a definite need for better data aggregators in the archives sector – the UK Archives Discovery Network may have an important role to play here.  But it would be wrong to fail to recognise the achievements of the sector in making archival catalogue data available, and consequently to miss out on opportunities for its re-use (particularly where it is even now held as easily harvested and re-purposed Encoded Archival Description, as with the ArchivesHub and A2A federated collections).  Equally, there is perhaps a need to bring postmodernist trends in archival theory to greater prominence within the UK archives practitioner community, and to explore how such concepts might support the kind of technology- and user-mediated innovation under discussion at the Rewired Culture unconference.

Following on from this, the second session I attended considered what would make  the ideal API for a cultural organisation.  Here we seemed to be back in ‘If we build it, will they come?‘ territory, or to be more precise, ‘If we release open data, what do we expect developers to do with it?’.  Indeed, I agree, it would be very useful to know what use has been made of existing cultural sector APIs and datasets made available, such as that provided by the V&A Museum, or, to give an archives example, what use has been made of the NARA catalogue data that has been made available for download?  As a non-geek archivist (albeit with geek-like tendencies), I also freely admit I do not altogether understand what data formats are optimal to maximise potential for re-use, nor do the developer community seem to articulate clearly what ‘open data’ might mean in practical terms.

Finally, at the end of the afternoon, we came to the hack presentations.  I was slightly disappointed that only two of the creations (HMRC Artworks and LandingZone) made any use of actual cultural content (as opposed to information about special events or the geographical locations of cultural organisations).  Nor, as far as I know, was any use made of archives sector data (although I do not know what data was provided, and it may be that there was no suitable archive data to hand).  So the hackers had maybe breathed new life into the discoverability of collections, whereas the real promise of user-led innovation in the cultual sector, it seems to me, is to enhance meaning and understanding of collections.  However, I left thinking that a hackday with archival data could prove an interesting experiment – and something of a technical challenge, presumably, given the contextual richness and complexity of archival catalogue data, in comparison to the discrete object record of the typical museum or library catalogue.

Incidentally, for an alternative view of the same sessions, Brian Kelly has written up his impressions of the day here and here (I have similar thoughts about Saturday events!).

Read Full Post »

I hinted in the post below that there might be some changes coming up on this blog.  This is because, as some of you will already know, I have moved on from West Yorkshire Archive Service, to start a PhD jointly supervised by UCL’s Department of Information Studies and The National Archives provisionally entitled ‘We Think, Not I think: Harnessing collaborative creativity to archival practice; implications of user participation for archival theory and practice‘.

This means that my interests are expanding beyond the original focus of Around the World in Eighty Gigabytes, which I originally set up to document my own voyages of discovery about digital preservation and how international initiatives in this field might be scaled down to apply within the small archives settings with which I was most familiar.  I have umm-ed and ah-ed for a bit about what I should do now – start a new blog or morph this one to cover aspects of user participation?  In the end, I have decided to continue with 80GB.  There are various reasons for this:

  • There are several common strands between digital preservation research and my current interests in user collaboration – they both relate to the impact of digital technologies on archival theory and practice, and many of the major issues (eg authority, context, trust, the cultural challenges of embedding technological change in operational settings) are debated in both areas of research.  I had been thinking that these common themes would make for a good posting on Ada Lovelace day, but I didn’t, er, quite get round to it!
  • I haven’t stopped being interested in digital preservation, or in the impact of digital technology on smaller archives, and I will continue to post on both themes when opportunities arise.
  • I want a space to express my own personal opinions on things which interest me and to explore ideas.  What I post here will not represent the views of The National Archives or UCL any more than my previous postings represented the official stance of West Yorkshire Archive Service.
  • I flatter myself to think there are a few people who read my ramblings, and know me as 80GB.  If they are interested in digital preservation and small archives, and are into following obscure blogs, I suspect they may be interested in reading about the implications of social media on archives too.
  • Putting everything together should mean that I actually update the blog rather more regularly.
  • To be blunt, there are a few events coming up that I think I will want to write about, and I can’t be bothered to set up a new blog…

However, if either of my current readers thinks that this is a really bad idea, they should please let me know in the comments…

Read Full Post »

Last Thursday I was delighted to attend the culminating workshop for the Society of Archivists‘ (SoA) funded digital curation project at Gloucestershire Archives.  As Viv Cothey, the developer employed by Gloucestershire Archives, has noted, “Local authority archivists may well be fully aware of the very many exhortations to do digital curation and to get involved but are frustrated by not knowing where to start”.  Building upon previous work on a prototype desktop ingest packager (GAip), the SoA project set out to create a proof of concept demonstration of a ‘trusted digital store’ suitable for use by a local government record office.  The workshop was an important outreach element of the project, aiming to build up understanding and experience of digital curation principles and workflow amongst archivists in the UK.  I have been involved with the management board for the SoA project, so I was eager to see how the demonstration tools which have been developed would be received by the wider digital preservation and archivist professional communities.

Others are much better qualified than me to evaluate the technical approach that the project has taken, and indeed Susan Thomas has already blogged her impressions over at futureArch.  For me, what was especially pleasing was to see a good crowd of ‘ordinary’ archivists getting stuck in with the demonstration tools – despite the unfamiliarity of the Linux operating system – and teasing out the purpose and process of each of the digital curation tools provided.  I hope that nobody objects to my calling them ‘ordinary’ – I think they will know what I mean, and it is how I would describe myself in this digital preservation context.

Digital preservation research has hitherto clustered around opposite ends of a spectrum.  At one end are the high level conceptual frameworks: OAIS and the like.  At the other end are the practical developments in repository and curation workflow tools in the higher education, national repository, and scientific research communities.  The problem here is the technological jargon which is frankly incomprehensible to your average archivist.  Gloucestershire’s project therefore attempts to fill an important gap in current provision, by providing a set of training tools to promote experimentation and discourse at practitioner level.

I’ll be interested to see the feedback from the workshop, and it’d be good to see some attendee comments here…

Read Full Post »

Lots of interesting work going on at North Carolina State Archives – plenty to read on their electronic records page. One project I’d particularly like to highlight is their work on the preservation of e-mail.

E-mail seems to be one of those types of electronic record about which there’s been lots and lots of discussion about how difficult it is to preserve, but not so much (at least that I knew of) in the way of practical advice of how you might go about attempting to keep it.

As well as the very practical guidelines for users, and suggested retention periods for e-mail, staff in the North Carolina State Archives Government Records Branch have been working on a collaborative project to transform e-mail from its native format into XML for preservation. The catalyst for this project was the deposit of e-mail messages from a former North Carolina governor and his staff. The website for the e-mail project has a full set of documentation, and links to other e-mail preservation initiatives. More recently, North Carolina has been working with the Collaborative Electronic Records Project (CERP) at the Smithsonian Institution Archives and the Rockefeller Archive Center, and an XML schema for a single e-mail account has now been published.

I have also visited the Smithsonian Institution Archives, who have also developed some automated tools to help with the processing of e-mail archives, which they hope to make available on their website in due course. The CERP Project will be of particular interest to UK local archives, since this work has been achieved with an emphasis on low-cost solutions suitable for small and medium-sized organisations.

Read Full Post »

Older Posts »