Feeds:
Posts
Comments

Lonely Hearts at #UKAD

A quick post in support of my lonely hearts ads for this year’s UKAD Forum.  I’ve submitted two – slightly concerned this makes me look rather archivally-geekily dissolute… Anyway, these were inspired by a chance conversation on twitter a few weeks back with a couple of archivists who had signed up in January for the Code Year lessons, but had found it hard going and fallen behind.

So firstly:

  • Digital professional, likes history, cake, structure & logic, hates dust, WLTM archivists interested in learning programming, for fun and comradeship.
I’ve posted here previously that I have occasionally mulled over the possibility/feasibility of some kind of online basic programming tutorial for archivists, and this even gathered a very welcome offer of assistance.  But I hadn’t taken the idea any further for a couple of reasons (a) I wasn’t sure of demand and (b) I think its really important that any tutorial should be based around real, practical archival scenarios.  I know from experience that it can be difficult to learn tech stuff (well, perhaps any stuff) if you can’t see how you might apply it in personally relevant contexts.  So, if you’re an archivist, what I’d like to find out in the comments to this post is why you’re interested in learning how to program – specifically in which archives-related tasks you hope such skills could usefully be applied.
And secondly:
  • Tech-loving archivist seeks passionate, patient devs with GSOH to help teach archivists to code.
Because I know I couldn’t put together what I have in mind on my own, and because I’d be embarrassed to show any of my code to anyone.  Those two things are linked actually: on a good day, with a following wind, and plenty of coffee and swearing, I can cobble together some lines of code which do something useful (for my purposes).  I am all too aware I am using perhaps 5-10% of the power of any given language, but then again if it works (eventually, usually!) for my purposes, perhaps 5% is all the function I require (plus the confidence to explore and experiment).  I need any real programmers interesting in helping out to understand all of that.  The aim here would be to put together a simple tutorial for beginners based around day-to-day archival tasks.  From programmers, I’d be interested in ideas of how to put together this tutorial, including what language(s) you might recommend and why.

I have absolutely no clue whether or how this might come off.  Maybe the only UK archivists interested are the three of us who talked on twitter.  Maybe we’ll decide its too much effort to tailor a resource specifically for archivists (and I do have the small matter of a PhD thesis to write over the next few months).  Maybe we’ll find there’s already something out there that’s perfect.  Maybe the consensus will be that archivists’ time would be better spent brushing up their markup skills, or learning about database design, or practising palaeography or something.  I just don’t know, but UKAD is all about networking and getting people together from different fields but with common interests in archives.  Or, as one archivist tweeted: “Wanted to be able to have halfway-sensible conversation with techies” – now there’s a challenge!

It’s been a while since I’ve posted here purely on digital preservation issues: my work has moved in other directions, although I did attend a number of the digital preservation sessions at the Society of American Archivists’ conference this summer.  I retain a keen interest in digital preservation, however, particularly in developments which might be useful for smaller archives.  Recently, I’ve been engaged in a little work for a project called DiSARM (Digital Scenarios for Archives and Records Management), preparing some teaching materials for the Masters students at UCL to work from next term, and in revising the contents of a guest lecture I present to the University of Liverpool MARM students on ‘Digital Preservation for the Small Repository’.  Consequently, I’ve been trying to catch up on the last couple of years (since I left West Yorkshire Archive Service at the end of 2009) of new digital preservation projects and research.

So what’s new?  Well, from a small archives perspective, I think the key development has been the emergence of several digital curation workflow management systems – Archivematica, Curator’s Workbench, the National Archive of Australia’s Digital Preservation Software Platform (others…?) – which package together a number of different tools to guide the archivist through a sequenced set of stages for the processing of digital content.  The currently available systems vary in their approaches to preservation, comprehensiveness, and levels of maturity, but represent a major step forward from the situation just a couple of years ago.  In 2008, if (like me when WYAS took in the MLA Yorkshire archive as a testbed), you didn’t have much (or any) money available, your only option was – as one of the former Liverpool students memorably pointed out to me – to cobble together a set of tools as best you could from old socks and a bit of string.  Now we have several offerings approaching an integrated software solution; moreover, these packages are generally open source and freely available, so would-be adopters are able to download each one and play about with it before deciding which one might suit them best.

Having said that, I still think it is important that students (and practitioners, of course) understand the preservation strategies and assumptions underlying each software suite.  When we learn how to catalogue archives, we are not trained merely to use a particular software tool.  Rather, we are taught the principles of archival description, and then we move on to see how these concepts are implemented in practice in EAD or by using specific database applications, such as (in the U.K.) CALM or Adlib.  For DiSARM, students will design a workflow and attempt to process a small sample set of digital documents using their choice of one or more of the currently available preservation tools, which they will be expected to download and install themselves.  This Do-It-Yourself approach will mirror the practical reality in many small archives, where the (frequently lone) archivist often has little access to professional IT support. Similarly, students at UCL are not permitted to install software onto the university network.  Rather than see this as a barrier, again I prefer to treat this situation a reflection of organisational reality.  There are a number of very good reasons why you would not want to process digital archives directly onto your organisation’s internal network, and recycling re-purposing old computer equipment of varying technical specifications and capabilities to serve as workstations for ingest is a fact of life even, it seems, for Mellon-funded projects!

In preparation for writing this DiSARM task, I began to put together for my own reference a spreadsheet listing all the applications I could think of, or have heard referenced recently, which might be useful for preservation processing tasks in small archives.  I set out to record:

  • the version number of the latest (stable) release
  • the licence arrangements for each tool
  • the URL from which the software can be downloaded
  • basic system requirements (essentially the platform(s) on which the software can be run – we have surveyed the class and know there is a broad range of operating systems in use, including several flavours of both Linux and Windows, and Mac OS X)
  • location of further documentation for each application
  • end-user support availability (forums or mailing lists etc)
This all proved surprisingly difficult.  I was half expecting that user-friendly documentation and (especially) support might often be lacking in the smaller projects, but several websites also lack clear statements about system requirements or the legal conditions under which the software may be installed and used.  Does ‘educational use and research’ cover a local authority archives providing research services to the general public (including academics)?  Probably not, but it would presumably allow for use in a university archives.  Thanks to the wonders of interpreted programming languages (mostly Java, but Python also puts in an occasional appearance), many tools are effectively cross-platform, but it is astonishing how many projects fail clearly to say so.  This is self-evident to a developer, of course, but not at all obvious to an archivist, who will probably be worried about bringing coffee into the repository, let alone a reptile.  Oh, and if you expect your software to be compiled from code, or require sundry other faffing around at a command line before use, I’m sorry, but your application is not “easy to implement” for ordinary mortals, as more than one site claimed.  Is it really so hard to generate binary executables for common operating systems (or if you have a good excuse – such as Archivematica which is still in alpha development – at least provide detailed step-by-step instructions)?  Many projects of course make use of SourceForge to host code, but use another website for documentation and updates – it can be quite confusing finding your way around.  The veritable ClamAV seems to have undergone some kind of Windows conversion, and although I’m sure that Unix packages must be there somewhere, I’m damned if I could find them easily…

All of which plays into a wider debate about just how far the modern archivist’s digital skills ought to reach (there are many other versions of this debate, the one linked – from 2006 so now quite old – just happens to be one of the most comprehensive attempts to define a required digital skill set for information practitioners).  No doubt there will be readers of this post who believe that archivists shouldn’t be dabbling in this sort of stuff at all, especially if s/he also works for an organisation which lacks the resources to establish a reliable infrastructure for a trusted digital repository.  And certainly I’ve been wondering lately whether some kind of archivists’ equivalent of The Programming Historian would be welcome or useful, teaching basic coding tailored to common tasks that an archivist might need to carry out.  But essentially, I don’t subscribe to the view that all archivists need to re-train as computer scientists or IT professionals.  Of course, these skills are still needed (obviously!) within the digital preservation community, but to drive a car I don’t need to be a mechanic or have a deep understanding of transport infrastructure.  Digital preservation needs to open up spaces around the periphery of the community where newcomers can experiment and learn, otherwise it will become an increasingly closed and ultimately moribund endeavour.

8am on Saturday morning, and those hardy souls who have not yet fled to beat Hurricane Irene home or who are stranded in Chicago, plus other assorted insomniacs, were presented with a veritable smörgåsbord of digital preservation goodness.  The programme has many of the digital sessions scheduled at the same time, and today I decided not to session-hop but stick it out in one session in each of the morning’s two hour-long slots.

My first choice was session 502, Born-Digital archives in Collecting Repositories: Turning Challenges into Byte-Size Opportunities, primarily an end-of-project report on the AIMS Project.  It’s been great to see many such practical digital preservation sessions at this conference, although I do slightly wonder what it will take before working with born-digital truly becomes part of the professional mainstream.  Despite the efforts of all the speakers at sessions like this (and in the UK, colleagues at the Digital Preservation Roadshows with which I was involved, and more recent similar events), there still appears to be a significant mental barrier which stops many archivists from giving it a go.  As the session chair began her opening remarks this morning, a woman behind me remarked “I’m lost already”.

There may be some clues in the content of this morning’s presentations: in amongst my other work (as would be the case for most archivists, I guess) I try to keep reasonably up-to-date with recent developments in practical digital preservation.  For instance, I was already well aware of the AIMS Project, although I’d not had a previous opportunity to hear about their work in any detail, but here were yet more new suggested tools for digital preservation: I happen to know of FTK Imager, having used it with the MLA Yorkshire archive accession, although what wasn’t stated was that the full FTK forensics package is damn expensive and the free FTK Imager Lite (scroll down the page for links) is an adequate and more realistic proposition for many cash-strapped archives.  BagIt is familiar too, but Bagger, a graphical user interface to the BagIt Library is new since I last looked (I’ll add links later – the Library of Congress site is down for maintenance”).  Sleuthkit was mentioned at the research forum earlier this week, but fiwalk (“a program that processes a disk image using the SleuthKit library and outputs its results in Digital Forensics XML”) was another new one on me, and there was even talk in this session of hardware write-blockers.  All this variety is hugely confusing for anybody who has to fit digital preservation around another day job, not to mention potentially expensive when it comes to buying hardware and software, and the skills necessary to install and maintain such a jigsaw puzzle system.  As the project team outlined their wish list for yet another application, Hypathia, I couldn’t help wondering whether we can’t promote a little more convergence between all these different tools both digital preservation specific and more general.  For instance, the requirement for a graphical drag ‘n’ drop interface to help archivists create the intellectual arrangement of a digital collection and add metadata reminded me very much of recent work at Simmons College on a graphical tool to help teach archival arrangement and description (whose name I forget, but will add it when it comes back to me!*).  I was interested particularly in the ‘access’ part of this session, particularly the idea that FTK’s bookmark and label functions could be transformed into user generated content tools, to enable researchers to annotate and tag records, and in the use of network graphs as a visual finding aid for email collections.

The rabbit-caught-in-headlights issue seems less of an issue for archivists jumping on the Web2.0 bandwagon, which was the theme of session 605, Acquiring Organizational Records in a Social Media World: Documentation Strategies in the Facebook Era, where we heard about the use of social media, primarily facebook, to contact and document student activities and student societies in a number of university settings, and from a university archivist just beginning to dip her toe into Twitter.  As a strategy of working directly with student organisations and providing training to ‘student archivists’ was outlined, as a method of enabling the capturing of social media content (both simultaneously with upload and by web-crawling sites afterwards), I was reminded of my own presentation at this conference: surely here is another example of real-life community development? The archivist is deliberately ‘going out to where the community is’ and adapting to the community norms and schedules of the students themselves, rather than expecting the students themselves to comply with archival rules and expectations.

This afternoon I went to learn about SNAC: the social networks and archival context project (session 710), something I’ve been hearing other people mention for a long time now but knew little about.  SNAC is extracting names (corporate, personal, family) from Encoded Archival Description (EAD) finding aids as EAC-CPF and then matching these together and with pre-existing authority records to create a single archival authorities prototype.  The hope is to then extend this authorities cooperative both nationally and potentially internationally.

My sincere thanks to the Society of American Archivists for their hospitality during the conference, and once again to those who generously funded my trip – the Archives and Records Association, University College London Graduate Conference Fund, UCL Faculty of Arts and UCL Department of Information Studies.

* UPDATE: the name of the Simmons’ archival arrangement platform is Archivopteryx (not to be confused with the Internet mail server Archiveopteryx which has an additional ‘e’ in the name)

#SAA11, Friday 26th August

Friday had a bit of a digital theme for me, beginning with a packed, standing-room-only session 302, Practical Approaches to Born-Digital Records: What Works Today. After a witty introduction by Chris Prom about his Fulbright research in Dundee, a series of speakers introduced their digital preservation work, with a real emphasis on ‘you too can do this’.  I learnt about a few new tools: firefly, a tool which is used to scan for American social security numbers and other sensitive information – not much use in a British context, I imagine, but an interesting approach all the same; TreeSize Professional, a graphical hard disk analyser; and several projects were making use of the Duke Data Accessioner, a tool with which I was already familiar but have never used.  During the morning session, I also popped in and out of ‘team-Brit’ session 304 Archives in the Web of Data which discussed developments in the UK and US in opening up and linking together archival descriptive data, and session 301 Archives on the Go: Using Mobile Technologies for Your Collections, where I caught a presentation on the use of FourSquare at Stanford University.

In the afternoon, I mostly concentrated on session 401, Re-arranging Arrangement and Description, with a brief foray into session 407, Faces of Diversity: Diasporic Archives and Archivists in the New Millennium.  Unless I missed this whilst I was out at the other session, nobody in session 410 mentioned the series system as a possible alternative or resolution to some of the tensions identified in a strict application of hierarchically-interpreted original order, which surprised me.  There were some hints towards a need for a more object-oriented view of description in a digital environment, and of methods of addressing the complexity of having multiple representations (physical, digital etc.), but I have been reading my UCL colleague Jenny Bunn’s recently completed PhD thesis, Multiple Narratives, Multiple Views: Observing Archival Description on flights for this trip, which would have added another layer to the discussion in this session.

And continuing the digital theme, I was handed a flyer for an event coming later this year (on 6th October): Day of Digital Archives which might interest some UK colleagues.  This is

…an initiative to raise awareness of digital archives among both users and managers. On this day, archivists, digital humanists, programmers, or anyone else creating, using, or managing digital archives are asked to devote some of their social media output (i.e. tweets, blog posts, youtube videos etc.) to describing their work with digital archives.  By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and maanged? Why are they important?

 

Day 1 Proper of the conference began with acknowledgements to the organisers, some kind of raffle draw and then a plenary address by an American radio journalist.  Altogether this conference has a celebratory feel to it – fitting since this is SAA’s 75th Anniversary year, but very different in tone from the UK conferences where the opening keynote speaker tends to be some archival luminary.  More on the American archival cultural experience later.

My session with Kate Theimer (of ArchivesNext fame) and Dr Elizabeth Yakel from the University of Michigan (probably best known amongst tech savvy UK practitioners for her work on the Polar Bear Expedition Finding Aid) followed immediately afterwards, and seemed to go well.  The session title was: “What Happens After ‘Here Comes Everybody’: An Examination of Participatory Archives”.  Kate proposed a new definition for Participatory Archives, distinguishing between participation and engagement (outreach); Beth spoke about credibility and trust, and my contribution was primarily concerned with contributors’ motivations to participate.  A couple of people, Lori Satter and Mimi Dionne have already blogged about the session (did I really say that?!), and here are my slides:

After lunch, I indulged in a little session-hopping, beginning in session 204 hearing about Jean Dryden’s copyright survey of American institutions, which asked whether copyright limits access to archives by restricting digitisation activity.  Dryden found that American archivists tended to take a very conservative approach to copyright expiry terms and obtaining third party permission for use, even though many interviewees felt that it would be good to take a bolder line.   Also, some archivists knowledge of the American copyright law was shaky – sounds familiar!  It would be interesting to see how UK attitudes would compare; I suspect results would be similar, however, I also wonder how easy it is in practical terms to suddenly start taking more of a risk-management approach to copyright after many years of insisting upon strict copyright compliance.

Next I switched to session 207, The Future is Now: New Tools to Address Archival Challenges, hearing Maria Esteva speak about some interesting collaborative work between the Texas Advanced Computing Center and NARA on visual finding aids, similar to the Australian Visible Archive research project. At the Exhibit Hall later, I picked up some leaflets about other NARA Applied Research projects and tools for file format conversion, data mining and record type identification which were discussed by other speakers in this session.

Continuing the digitization theme, although with a much more philosophical focus, Joan Schwartz in session 210, Genuine Encounter, Authentic Relationships: Archival Convenant & Professional Self-Understanding discussed the loss of materiality and context resulting from the digitisation of photographs (for example, a thumbnail image presented out of its album context).  She commented that archivists are often too caught up with the ‘how’ of digitization rather than the ‘why’.  I wouldn’t disagree with that.

Back to the American archival cultural experience, I was invited to the Michigan University ‘alumni mixer’ in the evening – a drinks reception with some short speeches updating alumni on staff news and recent developments in the archival education courses at the university.  All in all, archives students are much in evidence here: there are special student ‘ribbons’ to attach to name badges, many students are presenting posters on their work, and there is a special careers area where face-to-face advice is available from more senior members of SAA, current job postings are advertised, and new members of the profession can even pin up their curriculum vitae.  Some of this (the public posting of CVs in particular) might seem a bit pushy for UK tastes, and the one year length of UK Masters programmes (and the timing of Conference) of course precludes the presentation of student dissertation work.  But the general atmosphere seems very supportive of new entrants to the profession, and I feel there are ideas here that ARA’s New Professionals section might like to consider for future ARA Conferences.

A meeting connected to my research for me, followed by a little sight-seeing as I was not involved in any of the day’s events.

It’s interesting to compare how SAA organises their annual meeting, in comparison to the much smaller ARA event.  On the days running up to the main conference, SAA arranges a series of training workshops, group committee meetings, and some poor archivists are even taking their Certified Archivists examination.  This means that delegates arrive at different times over the first few days, and it is only tomorrow (Thursday) that the full size of the conference is revealed with the first plenary session.  Some of the pre-conference events are an extra charge to attendees.  I understand these are an important source of income for SAA but I imagine work out as a cost-effective way of attending training for the delegates, since they are already traveling to get to the main Annual Meeting itself.  I guess many ARA sections also hold committee meetings at Conference, but that is more of an informal arrangement.  I wondered if formalising it might simultaneously help save some costs for ARA, and boost attendance, but I think I’d switch the order so that such add-on events followed the main conference rather than precede it as happens here – the gradual ramping up towards the main event I find quite dis-orientating as a first time attendee.

Oh, and in the evening, a mass outing to watch the Chicago Cubs baseball game – and the home team won!  Fortunately, I had a very patient explainer…

This should be the first of several posts from this year’s Society of American Archivists Annual Meeting in Chicago, for which I have received generous funding to attend from UCL’s Graduate Conference Fund, and from the Archives and Records Association who asked me to blog the conference.  First impressions of a Brit: this conference is huge.  I could (and probably will) get lost inside the conference hotel, and the main programme involves parallel tracks of ten sessions at once.  And proceedings start at 8am.  This is all a bit of a shock to the system; not sure anybody would turn up if you started before 9am at the earliest back home! Anyway, the twitter tag to watch is #saa11, although with no wifi in the session rooms, live coverage of sessions will be limited to those who can get a mobile phone signal, which is a bit of a shame.

The conference proper starts on Thursday; the beginning of the week is mostly taken up with meetings, but on Tuesday I attended an impressive range of presentations at the SAA Research Forum.  Abstracts and bios for each speaker are already online (and are linked where relevant below), and I understand that slides will follow in the next week or so.  Here are some personal highlights and things which I think may be of interest to archivists back home in the UK:

It was interesting to see several presentations on digital preservation, many reflecting similar issues and themes to those which inspired my Churchill Fellowship research and the beginning of this blog back in 2008.  Whilst I don’t think I’d recommend anyone set out to learn about digital preservation techniques the hard way with seriously obsolete media, if you do find yourself in the position of having to deal with 5.25 inch floppy disks or the like, Karen Ballingher’s presentation on students’ work at the University of Texas – Austin had some handy links, including the UT-iSchool Digital Archaeology Lab Manual and related documentation and an open source forensics package called Sleuth Kit.  Her conclusions were more generally applicable, and familiar: the importance of documenting everything you do, including failures; planning out trials; and just do it – learn by doing a real digital preservation project.  Cal Lee was excellent (as ever) on Levels of Representation in Digital Collections, outlining a framework of digital information constructed of eight layers of representation from the bit(byte-)stream to aggregations of digital objects, and noting that archival description already supports description at multiple levels but has not yet evolved to address these multiple representation layers.  Eugenia Kim’s paper on her ChoreoSave project to determine the metadata elements required for digital dance preservation reminded me of several UK and European initiatives; Siobhan Davies Replay, which Eugenia herself referenced and talked about at some length; the University of the Arts London’s John Latham Archive, which I’ve blogged about previously, because Eugenia commented that choreographers had found the task of entering data into the numerous metadata fields onerous: once again it seems to me there is a tension between the (dance, in this case) event and the assumption that text offers the only or best means of describing and accessing that event; and the CASPAR research on the preservation of interactive multimedia performances at the University of Leeds.

For my current research work on user participation in archives, the following papers were particularly relevant: Helice Koffler‘s report on the RLG Social Metadata Working Group‘s project on evaluating the impact of social media on museums, libraries and archives.  A three-part report is to be issued; part one is due for publication in September 2011.  I understand that this will include some useful and much-needed definitions of ‘user interaction’ terminology.  Part 1 has moderation as its theme – Helice commented that a strict moderation policy can act as a barrier to participation (a point that I agree with up to a point – and will explore further in my own paper on Thursday).  Part 2 will be an analysis of the survey of social media use undertaken by the Working Group (4 U.K. organisations were involved in this, although none were archives).  As my interviews with archivists would also suggest, the survey found little evidence of serious problems with spam or abusive behaviour on MLA contributory platforms.  Ixchel Faniel reported on University of Michigan research on whether trust matters for re-use decisions.

With my UKAD hat on, the blue sky (sorry, I hate that term, but I think its appropriate in this instance) thinking on archival description methods which emerged from the Radcliffe Workshop on Technology and Archival Processing was particularly inspiring.  The workshop was a two-day event which brought together invited technologists (many of whom had not previously encountered archives at all) and archivists to brainstorm new thinking on ways to tackle cataloguing backlogs, streamline cataloguing workflows and improve access to archives.  A collections exhibition was used to spark discussion, together with specially written use cases and scenarios to guide each day’s discussion.  Suggestions included the use of foot-pedal operated overhead cameras to enable archival material to be digitised either at the point of accessioning, or during arrangement and description; experimenting with ‘trusted crowdsourcing’ – asking archivists to check documents for sensitivity – as a first step towards automating the redaction process of confidential information.  These last two suggestions reminded me of two recent projects at The National Archives in the U.K. – John Sheridan’s work to promote expert input into legislation.gov.uk (does anyone have a better link?) and the proposal to use text mining on closed record series which was presented to DSG in 2009.  Adam Kreisberg presented about the development of a toolkit for running focus groups by the Archival Metrics Project.  The toolkit will be tested with a sample session based upon archives’ use of social media, which I think could be very valuable for U.K. archivists.

Finally only because I couldn’t fit this one into any of the categories above, I found Heather Soyka and Eliot Wilczek‘s questions on how modern counter-insurgency warfare can be documented intriguing and thought-provoking.

Follow

Get every new post delivered to your Inbox.