8am on Saturday morning, and those hardy souls who have not yet fled to beat Hurricane Irene home or who are stranded in Chicago, plus other assorted insomniacs, were presented with a veritable smörgåsbord of digital preservation goodness.  The programme has many of the digital sessions scheduled at the same time, and today I decided not to session-hop but stick it out in one session in each of the morning’s two hour-long slots.

My first choice was session 502, Born-Digital archives in Collecting Repositories: Turning Challenges into Byte-Size Opportunities, primarily an end-of-project report on the AIMS Project.  It’s been great to see many such practical digital preservation sessions at this conference, although I do slightly wonder what it will take before working with born-digital truly becomes part of the professional mainstream.  Despite the efforts of all the speakers at sessions like this (and in the UK, colleagues at the Digital Preservation Roadshows with which I was involved, and more recent similar events), there still appears to be a significant mental barrier which stops many archivists from giving it a go.  As the session chair began her opening remarks this morning, a woman behind me remarked “I’m lost already”.

There may be some clues in the content of this morning’s presentations: in amongst my other work (as would be the case for most archivists, I guess) I try to keep reasonably up-to-date with recent developments in practical digital preservation.  For instance, I was already well aware of the AIMS Project, although I’d not had a previous opportunity to hear about their work in any detail, but here were yet more new suggested tools for digital preservation: I happen to know of FTK Imager, having used it with the MLA Yorkshire archive accession, although what wasn’t stated was that the full FTK forensics package is damn expensive and the free FTK Imager Lite (scroll down the page for links) is an adequate and more realistic proposition for many cash-strapped archives.  BagIt is familiar too, but Bagger, a graphical user interface to the BagIt Library is new since I last looked (I’ll add links later – the Library of Congress site is down for maintenance”).  Sleuthkit was mentioned at the research forum earlier this week, but fiwalk (“a program that processes a disk image using the SleuthKit library and outputs its results in Digital Forensics XML”) was another new one on me, and there was even talk in this session of hardware write-blockers.  All this variety is hugely confusing for anybody who has to fit digital preservation around another day job, not to mention potentially expensive when it comes to buying hardware and software, and the skills necessary to install and maintain such a jigsaw puzzle system.  As the project team outlined their wish list for yet another application, Hypathia, I couldn’t help wondering whether we can’t promote a little more convergence between all these different tools both digital preservation specific and more general.  For instance, the requirement for a graphical drag ‘n’ drop interface to help archivists create the intellectual arrangement of a digital collection and add metadata reminded me very much of recent work at Simmons College on a graphical tool to help teach archival arrangement and description (whose name I forget, but will add it when it comes back to me!*).  I was interested particularly in the ‘access’ part of this session, particularly the idea that FTK’s bookmark and label functions could be transformed into user generated content tools, to enable researchers to annotate and tag records, and in the use of network graphs as a visual finding aid for email collections.

The rabbit-caught-in-headlights issue seems less of an issue for archivists jumping on the Web2.0 bandwagon, which was the theme of session 605, Acquiring Organizational Records in a Social Media World: Documentation Strategies in the Facebook Era, where we heard about the use of social media, primarily facebook, to contact and document student activities and student societies in a number of university settings, and from a university archivist just beginning to dip her toe into Twitter.  As a strategy of working directly with student organisations and providing training to ‘student archivists’ was outlined, as a method of enabling the capturing of social media content (both simultaneously with upload and by web-crawling sites afterwards), I was reminded of my own presentation at this conference: surely here is another example of real-life community development? The archivist is deliberately ‘going out to where the community is’ and adapting to the community norms and schedules of the students themselves, rather than expecting the students themselves to comply with archival rules and expectations.

This afternoon I went to learn about SNAC: the social networks and archival context project (session 710), something I’ve been hearing other people mention for a long time now but knew little about.  SNAC is extracting names (corporate, personal, family) from Encoded Archival Description (EAD) finding aids as EAC-CPF and then matching these together and with pre-existing authority records to create a single archival authorities prototype.  The hope is to then extend this authorities cooperative both nationally and potentially internationally.

My sincere thanks to the Society of American Archivists for their hospitality during the conference, and once again to those who generously funded my trip – the Archives and Records Association, University College London Graduate Conference Fund, UCL Faculty of Arts and UCL Department of Information Studies.

* UPDATE: the name of the Simmons’ archival arrangement platform is Archivopteryx (not to be confused with the Internet mail server Archiveopteryx which has an additional ‘e’ in the name)

Day 1 Proper of the conference began with acknowledgements to the organisers, some kind of raffle draw and then a plenary address by an American radio journalist.  Altogether this conference has a celebratory feel to it – fitting since this is SAA’s 75th Anniversary year, but very different in tone from the UK conferences where the opening keynote speaker tends to be some archival luminary.  More on the American archival cultural experience later.

My session with Kate Theimer (of ArchivesNext fame) and Dr Elizabeth Yakel from the University of Michigan (probably best known amongst tech savvy UK practitioners for her work on the Polar Bear Expedition Finding Aid) followed immediately afterwards, and seemed to go well.  The session title was: “What Happens After ‘Here Comes Everybody’: An Examination of Participatory Archives”.  Kate proposed a new definition for Participatory Archives, distinguishing between participation and engagement (outreach); Beth spoke about credibility and trust, and my contribution was primarily concerned with contributors’ motivations to participate.  A couple of people, Lori Satter and Mimi Dionne have already blogged about the session (did I really say that?!), and here are my slides:

After lunch, I indulged in a little session-hopping, beginning in session 204 hearing about Jean Dryden’s copyright survey of American institutions, which asked whether copyright limits access to archives by restricting digitisation activity.  Dryden found that American archivists tended to take a very conservative approach to copyright expiry terms and obtaining third party permission for use, even though many interviewees felt that it would be good to take a bolder line.   Also, some archivists knowledge of the American copyright law was shaky – sounds familiar!  It would be interesting to see how UK attitudes would compare; I suspect results would be similar, however, I also wonder how easy it is in practical terms to suddenly start taking more of a risk-management approach to copyright after many years of insisting upon strict copyright compliance.

Next I switched to session 207, The Future is Now: New Tools to Address Archival Challenges, hearing Maria Esteva speak about some interesting collaborative work between the Texas Advanced Computing Center and NARA on visual finding aids, similar to the Australian Visible Archive research project. At the Exhibit Hall later, I picked up some leaflets about other NARA Applied Research projects and tools for file format conversion, data mining and record type identification which were discussed by other speakers in this session.

Continuing the digitization theme, although with a much more philosophical focus, Joan Schwartz in session 210, Genuine Encounter, Authentic Relationships: Archival Convenant & Professional Self-Understanding discussed the loss of materiality and context resulting from the digitisation of photographs (for example, a thumbnail image presented out of its album context).  She commented that archivists are often too caught up with the ‘how’ of digitization rather than the ‘why’.  I wouldn’t disagree with that.

Back to the American archival cultural experience, I was invited to the Michigan University ‘alumni mixer’ in the evening – a drinks reception with some short speeches updating alumni on staff news and recent developments in the archival education courses at the university.  All in all, archives students are much in evidence here: there are special student ‘ribbons’ to attach to name badges, many students are presenting posters on their work, and there is a special careers area where face-to-face advice is available from more senior members of SAA, current job postings are advertised, and new members of the profession can even pin up their curriculum vitae.  Some of this (the public posting of CVs in particular) might seem a bit pushy for UK tastes, and the one year length of UK Masters programmes (and the timing of Conference) of course precludes the presentation of student dissertation work.  But the general atmosphere seems very supportive of new entrants to the profession, and I feel there are ideas here that ARA’s New Professionals section might like to consider for future ARA Conferences.

This should be the first of several posts from this year’s Society of American Archivists Annual Meeting in Chicago, for which I have received generous funding to attend from UCL’s Graduate Conference Fund, and from the Archives and Records Association who asked me to blog the conference.  First impressions of a Brit: this conference is huge.  I could (and probably will) get lost inside the conference hotel, and the main programme involves parallel tracks of ten sessions at once.  And proceedings start at 8am.  This is all a bit of a shock to the system; not sure anybody would turn up if you started before 9am at the earliest back home! Anyway, the twitter tag to watch is #saa11, although with no wifi in the session rooms, live coverage of sessions will be limited to those who can get a mobile phone signal, which is a bit of a shame.

The conference proper starts on Thursday; the beginning of the week is mostly taken up with meetings, but on Tuesday I attended an impressive range of presentations at the SAA Research Forum.  Abstracts and bios for each speaker are already online (and are linked where relevant below), and I understand that slides will follow in the next week or so.  Here are some personal highlights and things which I think may be of interest to archivists back home in the UK:

It was interesting to see several presentations on digital preservation, many reflecting similar issues and themes to those which inspired my Churchill Fellowship research and the beginning of this blog back in 2008.  Whilst I don’t think I’d recommend anyone set out to learn about digital preservation techniques the hard way with seriously obsolete media, if you do find yourself in the position of having to deal with 5.25 inch floppy disks or the like, Karen Ballingher’s presentation on students’ work at the University of Texas – Austin had some handy links, including the UT-iSchool Digital Archaeology Lab Manual and related documentation and an open source forensics package called Sleuth Kit.  Her conclusions were more generally applicable, and familiar: the importance of documenting everything you do, including failures; planning out trials; and just do it – learn by doing a real digital preservation project.  Cal Lee was excellent (as ever) on Levels of Representation in Digital Collections, outlining a framework of digital information constructed of eight layers of representation from the bit(byte-)stream to aggregations of digital objects, and noting that archival description already supports description at multiple levels but has not yet evolved to address these multiple representation layers.  Eugenia Kim’s paper on her ChoreoSave project to determine the metadata elements required for digital dance preservation reminded me of several UK and European initiatives; Siobhan Davies Replay, which Eugenia herself referenced and talked about at some length; the University of the Arts London’s John Latham Archive, which I’ve blogged about previously, because Eugenia commented that choreographers had found the task of entering data into the numerous metadata fields onerous: once again it seems to me there is a tension between the (dance, in this case) event and the assumption that text offers the only or best means of describing and accessing that event; and the CASPAR research on the preservation of interactive multimedia performances at the University of Leeds.

For my current research work on user participation in archives, the following papers were particularly relevant: Helice Koffler‘s report on the RLG Social Metadata Working Group‘s project on evaluating the impact of social media on museums, libraries and archives.  A three-part report is to be issued; part one is due for publication in September 2011.  I understand that this will include some useful and much-needed definitions of ‘user interaction’ terminology.  Part 1 has moderation as its theme – Helice commented that a strict moderation policy can act as a barrier to participation (a point that I agree with up to a point – and will explore further in my own paper on Thursday).  Part 2 will be an analysis of the survey of social media use undertaken by the Working Group (4 U.K. organisations were involved in this, although none were archives).  As my interviews with archivists would also suggest, the survey found little evidence of serious problems with spam or abusive behaviour on MLA contributory platforms.  Ixchel Faniel reported on University of Michigan research on whether trust matters for re-use decisions.

With my UKAD hat on, the blue sky (sorry, I hate that term, but I think its appropriate in this instance) thinking on archival description methods which emerged from the Radcliffe Workshop on Technology and Archival Processing was particularly inspiring.  The workshop was a two-day event which brought together invited technologists (many of whom had not previously encountered archives at all) and archivists to brainstorm new thinking on ways to tackle cataloguing backlogs, streamline cataloguing workflows and improve access to archives.  A collections exhibition was used to spark discussion, together with specially written use cases and scenarios to guide each day’s discussion.  Suggestions included the use of foot-pedal operated overhead cameras to enable archival material to be digitised either at the point of accessioning, or during arrangement and description; experimenting with ‘trusted crowdsourcing’ – asking archivists to check documents for sensitivity – as a first step towards automating the redaction process of confidential information.  These last two suggestions reminded me of two recent projects at The National Archives in the U.K. – John Sheridan’s work to promote expert input into legislation.gov.uk (does anyone have a better link?) and the proposal to use text mining on closed record series which was presented to DSG in 2009.  Adam Kreisberg presented about the development of a toolkit for running focus groups by the Archival Metrics Project.  The toolkit will be tested with a sample session based upon archives’ use of social media, which I think could be very valuable for U.K. archivists.

Finally only because I couldn’t fit this one into any of the categories above, I found Heather Soyka and Eliot Wilczek‘s questions on how modern counter-insurgency warfare can be documented intriguing and thought-provoking.

Today I have a guest post about my research on UKOLN‘s Cultural Heritage Blog.

I am extremely lucky to have been offered a student place helping out at ECDL 2010, the European Conference on Research and Advanced Technology for Digital Libraries. The following are the highlights from day 1 of the conference for this archivist let loose in the virtual stacks:

Susan Dumais‘ keynote presented recent Microsoft research into the temporal dynamics of the web, analysing both changes to content and how people revisit web pages, checking for new content or looking for previously found information. She argued that the current generation of web browsers offer only a static, snapshot view, and went on to illustrate a browser plugin called DiffIE which highlights what has changed on a web page since the user’s last visit. She also presented some initial evaluation of this tool, which indicated that although perceptions of revisitation frequency remained constant, in practice users of the plugin increased their revisitation rate. There are lots of potential applications for this kind of tool for archives – from the presentation of web archives to the user interactions/annotations/ratings examples that Dumais herself gave. She also spoke about the implications of her research to the ranking and presentation of search results, illustrating how the pertinency and hence relevancy of certain terms can decline over time – for example, a user searching for ‘US Open’ this week is more likely to looking for information on the tennis grand slam than the golf tournament. Again, there are some interesting implications here for archival catalogue and document search systems.

Christos Papatheodorou from the Ionian University on Corfu spoke about the mapping of disparate cultural heritage (archives, museums, libraries) XML-based metadata schema to the CIDOC CRM ontology, and went on to describe the transformation of XPath queries submitted to a local (XML) data source into equivalent queries suitable to be submitted to other data sources, via the CIDOC CRM ontology. Having travelled up to Glasgow on the sleeper, arriving at 7 in the morning, I confess I got a bit lost in the technicalities from this point onwards, but the basic idea is to use CIDOC CRM as a mediator between disparate cultural heritage sources marked up in different XML schema. There was an extended worked example using EAD, which was nice to hear. In general, it has been interesting to observe a large number of papers at this conference which report experiments based upon data from cultural heritage rather than scientific domains. All of which tends to reinforce my thoughts after the Society of Archivists’ Conference about attracting technology experts to work in the archives sector: cultural heritage data is complex and thus, it seems, fascinating and intrinsically motivating to work with. We should be more proactive about promoting archival data to this kind of digital research community.

I’d been particularly looking forward to the paper on User-Contributed Metadata for Libraries and Cultural Institutions, although this turned out to be a Drexel University re-working of the Library of Congress flickr Commons experience, albeit concentrating more on user comments and less upon tagging. I was not quite comfortable anyway with the a priori categorization of comments described in the paper (into 1. personal and historical 2. link out (eg to wikipedia) 3. corrections and translations 4. link in (eg adding images to flickr groups) – seems to me that category 1. includes a particularly wide range of possible comment types), plus all the things I wanted to ask about seemed to be listed as ‘future research’. These include a fuller categorization, exploring motivations for adding comments, the presentation of comments in the user interface, and librarians’ (or archivists’) role in moderating user interaction.

I also enjoyed a couple of papers which presented ideas to do with improving information visualisation and user judgement using colours, layout and social navigation, all of which have some potential relevancy to the question of how best to present user-generated content.

Research and Advanced Technology for Digital Libraries, Proceedings of the 14th European Conference, ECDL 2010, Glasgow, UK, September 2010 is published as Lecture Notes in Computer Science 6273, available via SpringerLink, for those of you who have access.

And I have travelled twice on Glasgow’s baby underground train 🙂

I had a day at the Society of Archivists’ Conference 2010 in Manchester last Thursday; rather a mixed bag. I wasn’t there in time for the first couple of papers, but caught the main strand on digital preservation after the coffee break. It’s really good to see digital preservation issues get such a prominent billing (especially as I understand there few sessions on digital preservation at the much larger Society of American Archivists’ Conference this year), although I was slightly disappointed that the papers were essentially show and tell rehearsals of how various organisations are tackling the digital challenge. I have given exactly this type of presentation at the Society’s Digital Preservation Roadshows and at various other beginners/introductory digital preservation events over the past year.  Sometimes of course this is precisely what is needed to get the nervous to engage with the practical realities of digital preservation, but all the same, it’s a pity that one or more of the papers at the main UK professional conference of the year did not develop the theme a little more and stimulate some discussion on the wider implications of digital archives.  However, it was interesting to see how the speakers assumed familiarity with OAIS and digital preservation concepts such as emulation. I suspect some of the audience were left rather bewildered by this, but the fact that speakers at an archives conference feel they can make such assumptions about audience understanding does at least suggest that some awareness of digital preservation theory and frameworks is at last crawling into the professional mainstream.

I was interested in Meena Gautam’s description of the National Archives of India‘s preparations for receiving digital content, which included a strategy for recruiting staff with relevant expertise. Given India’s riches in terms of qualified IT professionals, I would have expected a large pool of skilled people from which to recruit. But the direction of her talk seemed to suggest that, in actual fact, NAI is finding it difficult to attract the experts they require. [There was one particular comment – that the NAI considers conversion to microfilm to be the current best solution for preserving born-digital content – which seemed particularly extraordinary, although I have since discovered the website of the Indian National Digital Preservation Programme, which does suggest that the Indian Government is thinking beyond this analogue paradigm.]  Anyway, NAI are not alone in encountering difficulties in attracting technically skilled staff to work in the archives sector.  I assume that the reason for this is principally economic, in that people with IT qualifications can earn considerably more working in the private sector.

It was a shame that there was not an opportunity for questions at the end of the session, as I would have liked to ask Dr Gautam how archives could or should try to motivate computer scientists and technicians to work in the area of digital preservation.  Later in the same session, Sharon McMeekin from the Royal Commission on the Ancient and Historical Monuments of Scotland advocated that archives organisations should collaborate to build digital repositories, and I and several others amongst the Conference twitter audience agreed.  But from observation of the real archives world, I would suggest that, although most people agree in principle that collaboration is the way forward, there is very little evidence – as yet at least – of partnership in practice. I wonder just how likely it is that joint repositories will emerge in this era of recession and budget cuts (which might be when we need collaboration most, but when in reality most organisations’ operations become increasingly internally focused).  Since it seems archives are unable to compete in attracting skilled staff in the open market, and – for a variety of reasons – it seems that the establishment of joint digital repositories is hindered by traditional organisational boundaries, I pondered whether a potential solution to both issues might lie in Yochai Benkler‘s third organisational form of commons-based peer-production: as the means both to motivate a community of appropriately skilled experts to contribute their knowledge to the archives sector, and to build sustainable digital archives repositories in common.  There are already of course examples of open source development in the digital archives world (Archivematica is a good example, and many other tools, such as the National Archives of Australia’s Xena and The (UK) National Archives DROID are available under open source licences), since the use of open standards fits well with the preservation objective.  Could the archives profession build on these individual beginnings in order to stimulate or become the wider peer community needed to underpin sustainable digital preservation?

After lunch, we heard from Dr Elizabeth Shepherd and Dr Andrew Flinn on the work of the ICARUS research group at UCL’s Department of Information Studies, of which my user participation research is a small part.  It was good to see the the twitter discussion really pick up during the paper, and a good question and answer session afterwards.  Sarah Wickham has a good summary of this presentation.

Finally, at the end of the day, I helped out with the session to raise awareness of the UK Archives Discovery Network, and to gather input from the profession of how they would like UKAD to develop.  We asked for comments on post-it notes on a series of ‘impertinent questions‘.  I was particularly interested in the outcome of the question based upon UKAD’s Objective 4: In reality, there will always be backlogs of uncatalogued archives.” Are volunteers the answer?  From the responses we gathererd, there does appear to be increasing professional acceptance of the use of volunteers in description activities, although I suspect our use of the word ‘volunteer’ may be holding back appreciation of an important difference between the role of ‘expert’ volunteers in archives and user participation by the crowd.

