Feeds:
Posts
Comments

Posts Tagged ‘TNA’

This should be the first of several posts from this year’s Society of American Archivists Annual Meeting in Chicago, for which I have received generous funding to attend from UCL’s Graduate Conference Fund, and from the Archives and Records Association who asked me to blog the conference.  First impressions of a Brit: this conference is huge.  I could (and probably will) get lost inside the conference hotel, and the main programme involves parallel tracks of ten sessions at once.  And proceedings start at 8am.  This is all a bit of a shock to the system; not sure anybody would turn up if you started before 9am at the earliest back home! Anyway, the twitter tag to watch is #saa11, although with no wifi in the session rooms, live coverage of sessions will be limited to those who can get a mobile phone signal, which is a bit of a shame.

The conference proper starts on Thursday; the beginning of the week is mostly taken up with meetings, but on Tuesday I attended an impressive range of presentations at the SAA Research Forum.  Abstracts and bios for each speaker are already online (and are linked where relevant below), and I understand that slides will follow in the next week or so.  Here are some personal highlights and things which I think may be of interest to archivists back home in the UK:

It was interesting to see several presentations on digital preservation, many reflecting similar issues and themes to those which inspired my Churchill Fellowship research and the beginning of this blog back in 2008.  Whilst I don’t think I’d recommend anyone set out to learn about digital preservation techniques the hard way with seriously obsolete media, if you do find yourself in the position of having to deal with 5.25 inch floppy disks or the like, Karen Ballingher’s presentation on students’ work at the University of Texas – Austin had some handy links, including the UT-iSchool Digital Archaeology Lab Manual and related documentation and an open source forensics package called Sleuth Kit.  Her conclusions were more generally applicable, and familiar: the importance of documenting everything you do, including failures; planning out trials; and just do it – learn by doing a real digital preservation project.  Cal Lee was excellent (as ever) on Levels of Representation in Digital Collections, outlining a framework of digital information constructed of eight layers of representation from the bit(byte-)stream to aggregations of digital objects, and noting that archival description already supports description at multiple levels but has not yet evolved to address these multiple representation layers.  Eugenia Kim’s paper on her ChoreoSave project to determine the metadata elements required for digital dance preservation reminded me of several UK and European initiatives; Siobhan Davies Replay, which Eugenia herself referenced and talked about at some length; the University of the Arts London’s John Latham Archive, which I’ve blogged about previously, because Eugenia commented that choreographers had found the task of entering data into the numerous metadata fields onerous: once again it seems to me there is a tension between the (dance, in this case) event and the assumption that text offers the only or best means of describing and accessing that event; and the CASPAR research on the preservation of interactive multimedia performances at the University of Leeds.

For my current research work on user participation in archives, the following papers were particularly relevant: Helice Koffler‘s report on the RLG Social Metadata Working Group‘s project on evaluating the impact of social media on museums, libraries and archives.  A three-part report is to be issued; part one is due for publication in September 2011.  I understand that this will include some useful and much-needed definitions of ‘user interaction’ terminology.  Part 1 has moderation as its theme – Helice commented that a strict moderation policy can act as a barrier to participation (a point that I agree with up to a point – and will explore further in my own paper on Thursday).  Part 2 will be an analysis of the survey of social media use undertaken by the Working Group (4 U.K. organisations were involved in this, although none were archives).  As my interviews with archivists would also suggest, the survey found little evidence of serious problems with spam or abusive behaviour on MLA contributory platforms.  Ixchel Faniel reported on University of Michigan research on whether trust matters for re-use decisions.

With my UKAD hat on, the blue sky (sorry, I hate that term, but I think its appropriate in this instance) thinking on archival description methods which emerged from the Radcliffe Workshop on Technology and Archival Processing was particularly inspiring.  The workshop was a two-day event which brought together invited technologists (many of whom had not previously encountered archives at all) and archivists to brainstorm new thinking on ways to tackle cataloguing backlogs, streamline cataloguing workflows and improve access to archives.  A collections exhibition was used to spark discussion, together with specially written use cases and scenarios to guide each day’s discussion.  Suggestions included the use of foot-pedal operated overhead cameras to enable archival material to be digitised either at the point of accessioning, or during arrangement and description; experimenting with ‘trusted crowdsourcing’ – asking archivists to check documents for sensitivity – as a first step towards automating the redaction process of confidential information.  These last two suggestions reminded me of two recent projects at The National Archives in the U.K. – John Sheridan’s work to promote expert input into legislation.gov.uk (does anyone have a better link?) and the proposal to use text mining on closed record series which was presented to DSG in 2009.  Adam Kreisberg presented about the development of a toolkit for running focus groups by the Archival Metrics Project.  The toolkit will be tested with a sample session based upon archives’ use of social media, which I think could be very valuable for U.K. archivists.

Finally only because I couldn’t fit this one into any of the categories above, I found Heather Soyka and Eliot Wilczek‘s questions on how modern counter-insurgency warfare can be documented intriguing and thought-provoking.

Read Full Post »

This post is a thank you to my followers on Twitter, for pointing me towards many of the examples given below.  The thoughts on automated description and transcription are a preliminary sketching out of ideas (which, I suppose, is a way of excusing myself if I am not coherent!), on which I would particularly welcome comments or further suggestions:

A week or so before Easter, I was reading a paper about the classification of galaxies on the astronomical crowdsourcing website, Galaxy Zoo.  The authors use a statistical (Bayesian) analysis to distil an accurate sample of data, and then compare the reliability of this crowdsourced sample to classifications produced by expert astronomers.  The article also refers to the use of sample data in training artificial neural networks in order to automate the galaxy classification process.

This set me thinking about archivists’ approaches to online user participation and the harnessing of computing power to solve problems in archival description.  On the whole, I would say that archivists (and our partners on ‘digital archives’ kinds of projects) have been rather hamstrung by a restrictive ‘human-scale’, qualitatively-evaluated, vision of what might be achievable through the application of computing technology to such issues.

True, the notion of an Archival Commons evokes a network-oriented archival environment.  But although the proponents of this concept recognise “that the volume of records simply does not allow for extensive contextualization by archivists to the extent that has been practiced in the past”, the types of ‘functionalities’ envisaged to comprise this interactive descriptive framework still mirror conventional techniques of description in that they rely upon the human ability to interpret context and content in order to make contributions imbued with “cultural meaning”.  There are occasional hints of the potential for more extensible (?web scale) methods of description, in the contexts of tagging and of information visualization, but these seem to be conceived more as opportunities for “mining the communal provenance” of aggregated metadata – so creating additional folksonomic structures alongside traditional finding aids.  Which is not to say that the Archival Commons is not still justified from a cultural or societal perspective, but that the “volume of records” cataloguing backlog issue will require a solution which moves beyond merely adding to the pool of potential participants enabled to contribute narrative descriptive content and establish contextual linkages.

Meanwhile, double-keying, checking and data standardisation procedures in family history indexing have come a long way since the debacle over the 1901 census transcription. But double-keying for a commercial partner also signals a doubling of transcription costs, possibly without a corresponding increase in transcription accuracy.  Or, as the Galaxy Zoo article puts it, “the overall agreement between users does not necessarily mean improvement as people can agree on a wrong classification”.  Nevertheless, these norms from the commercial world have somehow transferred themselves as the ‘gold standard’ into archival crowdsourcing transcription projects, in spite of the proofreading overhead (bounded by the capacity of the individual, again).  As far as I am aware, Old Weather (which is, of course, a Zooniverse cousin of Galaxy Zoo) is the only project working with archival content which has implemented a quantitative approach to assess transcription accuracy – improving the project’s completion rate in the process, since the decision could be taken to reduce the number of independent transcriptions required from five to three.

Pondering these and other such tangles, I began to wonder whether there have indeed been any genuine attempts to harness large-scale processing power for archival description or transcription.  Tools are now available commercially designed to decipher modern handwriting (two examples: MyScript for LiveScribe; Evernote‘s text recognition tool), why not an automated palaeographical tool?  Vaguely remembering that The National Archives had once been experimenting with text mining for both cataloguing and sensitivity classification [I do not know what happened to this project – can anyone shed some light on this?], and recollecting the determination of one customer at West Yorkshire Archive Service who tried (and failed) valiantly to teach his Optical Character Recognition (OCR) software to recognise nearly four centuries of clerk’s handwriting in the West Riding Registry of Deeds indexes, I put out a tentative plea on Twitter for further examples of archival automation.  The following examples are the pick of the amazing set of responses I received:

  • The Muninn Project aims to extract and classify written data about the First World War from digitized documents using raw computing power alone.  The project appears to be at an early stage, and is beginning with structured documents (those written onto pre-printed forms) but hopes to move into more challenging territory with semi-structured formats at a later stage.
  • The Dutch Monk Project (not to be confused with the American project of the same name, which facilitates text mining in full-text digital library collections!) seeks to make use of the qualitative interventions of participants playing an online transcription correction game in order to train OCR software for improved handwriting recognition rates in future.  The project tries to stimulate user participation through competition and rewards, following the example of Google Image Labeller.  If your Dutch is good, Christian van der Ven’s blog has an interesting critique of this project (Google’s attempt at translation into English is a bit iffy, but you can still get the gist).
  • Impact is a European funded project which takes a similar approach to the Monk project, but has focused upon improving automated text recognition with early printed books.  The project has produced numerous tools to improve both OCR image recognition and lexical information retrieval, and a web-based collaborative correction platform for accuracy verification by volunteers.  The input from these volunteers can then in turn be used to further refine the automated character recognition (see the videos on the project’s YouTube channel for some useful introductory materials).  Presumably these techniques could be further adapted to help with handwriting recognition, perhaps beginning with the more stylised court hands, such as Chancery hand.  The division of the quality control checks into separate character, word, and page length tasks (as illustrated in this video) is especially interesting, although I think I’d want to take this further and partition the labour on each of the different tasks as well, rather than expecting one individual to work sequentially through each step.  Thinking of myself as a potential volunteer checker, I think I’d be likely to get bored and give up at the letter-checking stage.  Perhaps this rather more mundane task would be more effectively offered in return for peppercorn payment as a ‘human intelligence task’ on a platform such as Amazon Mechanical Turk, whilst the volunteer time could be more effectively utilised on the more interesting word and page level checking.
  • Genealogists are always ahead of the game!  The Family History Technology Workshop held annually at Brigham Young University usually includes at least one session on handwriting recognition and/or data extraction from digitized documents.  I’ve yet to explore these papers in detail, but there looks to be masses to read up on here.
  • Wot no catalogue? Google-style text search within historic manuscripts?  The Center for Intelligent Information Retrieval (University of Massachusetts Amherst) handwriting retrieval demonstration systems – manuscript document retrieval on the fly.
  • Several other tools and projects which might be of interest are listed in this handy google doc on Transcribing Handwritten Documents put together by attendees at the DHapi workshop held at the Maryland Institute for Technology in the Humanities earlier this year.  Where I’ve not mentioned specific examples directly here its mostly because these are examples of online user transcription interfaces (which for the purposes of this post I’m classing as technology-enhanced projects, as opposed to technology-driven, which is my main focus here – if that makes sense? Monk and Impact creep in above because they combine both approaches).

If you know of other examples, please leave a comment…

Read Full Post »

Digital Connections: new methodologies for British history, 1500-1900

I spent an enjoyable afternoon yesterday (a distinct contrast, I might add, to the rest of my day, but that is another story) at the Digital Connections workshop at the Institute of Historical Research in London, which introduced two new resources for historical research: the federated search facility, Connected Histories, and the Mapping Crime project to link crime-related documents in the John Johnson collection of ephemera at the Bodleian Library in Oxford to related external resources.

After a welcome from Jane Winters, Tim Hitchcock kicked off proceedings with an enthusiastic endorsement of Connected Histories and generally of all things digital and history-related in Towards a history lab for the digital past. I guess I fundamentally disagree with the suggestion that concepts of intellectual property might survive unchallenged in some quarters (in fact I think the idea is contradicted by Tim’s comments on the Enlightenment inheritance and the ‘authorship’ silo). But then again, we won’t challenge the paywall by shunning it altogether, and in that sense, Connected Histories’ ‘bridges’ to the commercial digitisation providers are an important step forward.  It will be interesting to see how business models evolve in response – there were indications yesterday that some providers may be considering moves towards offering short-term access passes, like the British Newspapers 1800-1900 at the British Library, where you can purchase a 24 hour or 7 day pass if you do not have an institutional affiliation.  Given the number of north American accents in evidence yesterday afternoon, too, there will be some pressure on online publishers to open up access to their resources to overseas users and beyond UK Higher Education institutions.

For me, the most exciting parts of the talk, and ensuing demonstration-workshop led by Bob Shoemaker, related to the Connected Histories API (which seems to be a little bit of a work-in-progress), which led to an interesting discussion about the technical skills required for contemporary historical research; and the eponymous ‘Connections‘, a facility for saving, annotating and (if desired) publicly sharing Connected Histories search results. The reception in the room was overwhelmingly positive – I’ll be fascinated to see if Connected Histories can succeed where other tools have failed to get academic historians to become more sociable about their research and expertise.  Connected Histories is not, in fact, truly a federated search platform, in that indexes for each participating resource have been re-created by the Connected Histories team, which then link back to the original source.  With the API, this will really open up access to many resources which were designed for human interrogation only, and I am particularly pleased that several commercial providers have been persuaded to sign up to this model.  It does, though, seem to add to the complexity of keeping Connected Histories itself up-to-date: there are plans to crawl contributing websites every 6 months to detect changes required.  This seems to me quite labour intensive, and I wonder how sustainable it will prove to be, particularly as the project team plan to add yet more resources to the site in the coming months and welcome enquiries from potential content providers (with an interesting charging model to cover the costs of including new material).  This September’s updates are planned to include DocumentsOnline from The National Archives, and there were calls from the audience yesterday to include catalogue data from local archives and museums.

Without wishing to come over as dismissive as this possibly sounds, David Tomkins’ talk about the Mapping Crime project was a pretty good illustration of what can be done when you have a generous JISC grant and a very small collection.  Coming from (well, my working background at least) a world of extremely large, poorly documented collections, where no JISC-equivalent funder is available, I was more interested in the generic tools provided for users in the John Johnson collection: permanent URIs for each item, citation download facilities, a personal, hosted user space within the resource, and even a scalable measuring tool for digitised documents.  I wonder why it is taking archival management software developers so long to get round to providing these kinds of tools for users of online archive catalogues? There was also a fascinating expose of broadsheet plagiarism revealed by the digitisation and linking of two sensationalist crime reports which were identical in all details – apart from the dates of publication and the names of those involved.  A wonderful case study in archival authenticity.

David Thomas’ keynote address was an entertaining journey through 13 years of online digitisation effort, via the rather more serious issues of sustainability and democratization of our digital heritage.  His conclusions, that the future of history is about machine-to-machine communication, GIS and spatial data especially, might have come as a surprise to the customary occupants of the IHR’s Common Room, but did come with a warning of the problems attached to the digital revolution from the point of view of ordinary citizens and users: the ‘google issue’ of search results presented out of context; the maze of often complex and difficult-to-interpret online resources; and the question of whether researchers have the technical skills to fully exploit this data in new ways.

Read Full Post »

A round-up and some brief reflections on a number of different events and presentations I’ve attended recently:

Many of this term’s Archives and Society seminars at the Institute of Historical Research have been been on particularly pertinent subjects for me, and rather gratifyingly have attracted bumper audiences (we ran out of chairs at the last one I attended).  I’ve already blogged here about the talk on the John Latham Archive.  Presentations by Adrian Autton and Judith Bottomley from Westminster Archives, and Nora Daly and Helen Broderick from the British Library revealed an increasing awareness and interest in the use of social media in archives, qualified by a growing realisation that such initiatives are not self-sustaining, and in fact require a substantial commitment from archive staff, in time if not necessarily in financial terms, if they are to be successful.  Nora and Helen’s talk also prompted an intriguing audience debate about the ‘usefulness’ of user contributions.  To me, this translates as ‘why don’t users behave like archivists’ (or possibly like academic historians)?  But if the aim of promoting archives through social media is to attract new audiences, as is often claimed, surely we have to expect and celebrate the different perspectives these users bring to our collections.  Our professional training perhaps gives us tunnel vision when it comes to assessing the impact of users’ tagging and commenting.  Just because users’ terminology cannot be easily matched to the standardised metadata elements of ISAD(G) doesn’t mean it lacks relevance or usefulness outside of archival contexts.  Similar observations have been made in research in the museums and art galleries world, where large proportions of the tags contributed to the steve.museum prototype tagger represented terms not found in museum documentation (in one case, greater than 90% of tags were ‘new’ terms).  These new terms are viewed an unparalleled opportunity to enhance the accessibility of museum objects beyond traditional audiences, augmenting professional descriptions, not replacing them.

Releasing archival description from the artificial restraints imposed by the canon of professional practice was also a theme of my UCL colleague, Jenny Bunn’s, presentation of her PhD research, ‘The Autonomy Paradox’.  I find I can balance increased understanding about her research each time I hear her speak, with simultaneously greater confusion the deeper she gets into second order cybernetics!  Anyway, suffice it to say that I cannot possibly do justice to her research here, but anyone in north America might like to catch her at the Association of Canadian Archivists’ Conference in June.  I’m interested in the implications of her research for a move away from hierarchical or even series-system description, and whether this might facilitate a more object-oriented view of archival description.

Last term’s Archives and Society series included a talk by Nicole Schutz of Aberystwyth University about her development of a cloud computing toolkit for records management.  This was repeated at the recent meeting of the Data Standards Section of the Archives and Records Association, who had sponsored the research.  At the same meeting, I was pleased to discover that I know more than I thought I did about linked data and RDF, although I am still relieved that Jane Stevenson and the technical team behind the LOCAH Project are pioneering this approach in the UK archives sector and not me!  But I am fascinated by the potential for linked open data to draw in a radical new user community to archives, and will be watching the response to the LOCAH Project with interest.

The Linked Data theme was continued at the UKAD (UK Archives Discovery Network) Forum held at The National Archives on 2 March.  There was a real buzz to the day – so nice to attend an archives event that was full of positive energy about the future, not just ‘tough talk for tough times’.  There were three parallel tracks for most of the day, plus a busking space for short presentations and demos.  Obviously, I couldn’t get to everything, but highlights for me included:

  • the discovery of a second archives Linked Data project – the SALDA project at the University of Sussex, which is extract archival descriptions from CALM using EAD, and then transform them into Linked Data
  • Victoria Peters’ overview of the open source archival description software, ICA-AtoM – feedback welcomed, I think, on the University of Stathclyde’s new online catalogue which uses ICA-AtoM.
  • chatting about Manchester Archive + (Manchester archival images on flickr)
  • getting an insider’s view of HistoryPin and Ancestry’s World Archives Project – the latter particularly fascinating to me in the context of motivating and supporting contributors in online archival contexts

Slides from the day, including mine on Crowds and Communities in the Archives, are being gathered together on slideshare at http://www.slideshare.net/tag/ukad.  Initial feedback from the day was good, and several people have blogged about the event (including Bethan Ruddock from the ArchivesHub, a taxonomist’s viewpoint at VocabControl, Karen Watson from the SALDA Project, and The Questing Archivist).

Edit to add Kathryn Hannan’s Archives and Auteurs blog post.

Read Full Post »

A bit late with this, but I’ve just noticed that fellow National Archives / UCL PhD student Ann Fenech has posted her 3-minute presentation from the recent PhD day held at The National Archives on her blog, and its occurred to me that mine is probably quite a good short introduction to what I’m working on too:

Read Full Post »

A round-up of a few pieces of digital goodness to cheer up a damp and dark start to October:

What looks like a bumper new issue of the Journal of the Society of Archivists (shouldn’t it be getting a new name?) is published today.  It has an oral history theme, but actually it was the two articles that don’t fit the theme which caught my eye for this blog.  Firstly, Viv Cothey’s final report on the Digital Curation project, GAip and SCAT, at Gloucestershire Archives, with which I had a minor involvement as part of the steering group for the Sociey of Archivists’-funded part of the work.  The demonstration software developed by the project is now available for download via the project website.  Secondly, Candida Fenton’s dissertation research on the Use of Controlled Vocabulary and Thesauri in UK Online Finding Aids will be of  interest to my colleages in the UKAD network.  The issue also carries a review, by Alan Bell, of Philip Bantin’s book Understanding Data and Information Systems for Recordkeeping, which I’ve also found a helpful way in to some of the more technical electronic records issues.  If you do not have access via the authentication delights of Shibboleth, no doubt the paper copies will be plopping through ARA members’ letterboxes shortly.

Last night, by way of supporting the UCL home team (read: total failure to achieve self-imposed writing targets), I had my first go at transcribing a page of Jeremy Bentham’s scrawled notes on Transcribe Bentham.  I found it surprisingly difficult, even on the ‘easy’ pages!  Admittedly, my paleographical skills are probably a bit rusty, and Bentham’s handwriting and neatness leave a little to be desired – he seems to have been a man in a hurry – but what I found most tricky was not being able to glance at the page as a whole and get the gist of the sentence ahead at the same time as attempting to decipher particular words.  In particular, not being able to search down the whole page looking for similar letter shapes.  The navigation tools do allow you to pan and scroll, and zoom in and out, but when you’ve got the editing page up on the screen as well as the document, you’re a bit squished for space.  Perhaps it would be easier if I had a larger monitor.  Anyway, it struck me that this type of transcription task is definitely a challenge, for people who want to get their teeth into something, not the type of thing you might dip in and out of in a spare moment (like indicommons on iPhone and iPad, for instance).

I’m interested in reward and recognition systems at the moment, and how crowdsourcing projects seek to motivate participants to contribute.  Actually, it’s surprising how many projects seem not to think about this at all – the build it and wait for them to come attitude.  Quite often, it seems, the result is that ‘they’ don’t come, so it’s interesting to see Transcribe Bentham experiment with a number of tricks for monitoring progress and encouraging people to keep on transcribing.  So, there’s the Benthamometer for checking on overall progress, you can set up a watchlist to keep an eye on pages you’ve contributed to, individual registered contributors can set up a user profile to state their credentials, chat to fellow transcribers on the discussion forum, and there’s a points system, depending on how active you are on the site, and a leader board of top transcribers.  The leader board seems to be fueling a bit of healthy transatlantic competition right at the moment, but given the ‘expert’ wanting-to-crack-a-puzzle nature of the task here, I wonder whether the more social / community-building facilities might prove more effective over the longer term than the quantitative approaches.  One to watch.

Finally, anyone with the techie skills to mashup data ought to be welcoming The National Archives’ work on designing the Open Government Licence (OGL) for public sector information in the U.K.  I haven’t (got the technical skills) but I’m welcoming it anyway in case anyone who has hasn’t yet seen the publicity about it, and because I am keen to be associated with angels.

Read Full Post »

A write-up of the second Archival Education Research Institute which I attended at  from 21st to 25th June.

The scheduled programme (or program, I suppose!) was a mixture of plenary sessions on the subject of interdisciplinarity in archival research, methods and mentoring workshops, curriculum discussion sessions, and research papers given by both doctoral students and faculty members.  We also experienced two fascinating and engaging, if slightly US-centric, theatrical performances by the University of Michigan’s Center for Research on Learning Theatre Program (ok, now I’m confused – why would it be ‘center’ but not ‘theater’?).

Most valuable to me personally were the methods workshops on Information Retrieval and User Studies.  IR research is largely new to me, although I was aware that current development work at The National Archives [TNA] includes a research strand being carried out at the University of Sheffield’s Information Studies Department which uses IR techniques to investigate information-seeking behaviour across TNA’s web domain and catalogue knowledge base.  I was interested to see whether these methods could be adapted for my research interests in user participation.  User Studies turned out to be more familiar territory, not least because of many years’ responsibility coordinating and analysing the Public Services Quality Group[PSQG] Survey of Visitors to UK Archives across the West Yorkshire Archive Service‘s five offices.  I hadn’t previously appreciated that the PSQG survey is unique in the archival world in providing over a decade’s worth of longitudinal data on UK archive users (despite what it says on the NCA website, the survey was first run in 1998), and it seems a shame that only occasional annual reports of the survey results have been formally published.

Of the paper sessions, I was particularly interested in several examples of participatory archive projects.  The examples given in the Digital Cultural Communities session – in particular Donghee Sinn’s outline of the No Gun Ri massacre digital archives and Vivian Wong’s film-making work with the Chinese American community in Los Angeles, together with Michelle Caswell’s description of the Cambodian Human Rights Tribunal in the session on Renegotiating Principles and Practice – reinforced my earlier conviction that past trauma or marginalisation may help to promote user-archives collaboration, and provide greater resilience against (or perhaps more sophisticated mechanisms for resolving) controversy.  However, Sue McKemmish and Shannon Faulkhead, in their presentations about another previously persecuted grouping, Australian Aboriginal natives (the Koorie and Gundjitmara communities specifically), gave me hope that the participatory attitudes of the Indigenous communities are just an early precursor to a much wider social movement which puts a high value upon co-creation and co-responsibility for records and record-keeping.  [Incidentally, if you have access, I see that Sue and Shannon’s Monash colleague Livia Iacovino has just published an article in Archival Science entitled Rethinking archival, ethical and legal frameworks for records of Indigenous Australian communities: a participant relationship model of rights and responsibilities, which looks highly pertinent – it’s currently in the ‘online first’ section]  I was also interested in Shannon’s comments about developing a framework to incorporate or authenticate traditional oral knowledge as an integral part of the overall community ‘archive’ (I’m not quite sure I’ve got this quite right, and would like to chat to her further about it).  William Uricchio has remarked of contemporary digital networks that “Decentralized, networked, collaborative, accretive, ephemeral and dynamic… these developments and others like them bear a closer resemblance to oral cultures than to the more stable regimes of print (writing and the printing press) and the trace (photography, film, recorded sound)”¹.  What can we learn from oral culture to inform our development of participatory practice in the digital domain?

Carlos Ovalle gave a useful paper on Copyright Challenges with Public Access to Digital Materials in Cultural Institutions in the Challenges/Problems in Use, Re-use, and Sharing session, which was interesting in the light of the UK Digital Economy Act and recent amendments to UK Copyright legislation, and some of my own current concerns about digitisation practices and business models in UK archives.

I cannot say I particularly enjoyed the plenary sessions and ensuing discussions.  I found the whole dispute about whether archival ‘science’ could, or should, be considered inter-disciplinary or multi-disciplinary, and which disciplines are core or which are peripheral, somewhat sterile and frankly rather futile.  Some of the arguments seemed to stand as witness to a kind of professional identity crisis, undermining any claim that archival research might have to a wider relevance in the modern world.  I was particularly surprised at how controversial ‘collaboration’ seemed to be in a US research context – a striking contrast I felt to the pervasive ‘partnership’ ethos that is accepted best practice in fields with which I am familiar in the UK.  Not just, I think, because I worked for what is in many ways a pioneering partnership of local authorities at West Yorkshire Joint Services; the current government policy on archives, Archives for the 21st Century similarly emphasises the benefits and indeed necessity (in the current economic climate) of partnership working in a specific archives context.

Sadly, there doesn’t seem to have been much blogging about AERI, but you can read one of the Australian participant’s Lessons from AERI Part I (is there a part II coming soon, Leisa?!).  I’ll link to any further blog posts I notice in the comments.

Finally, nothing to do with AERI, but I’ve finally got round to registering this blog with technorati and need to include the claim code in a post, so here goes: CF2RCBCUPWQC.

¹Uricchio, W. ‘Moving Beyond the Artifact: Lessons from Participatory Culture’ in Preserving the Digital Heritage Netherlands National Commission for UNESCO, 2007.  <http://www.knaw.nl/ecpa/publ/pdf/2735.pdf>

Read Full Post »

In conversation with the very excellent RunCoCo project at Oxford University last Friday, I revisited a question which will, I think, prove central to my current research – establishing trust in an online archival environment.  This is an important issue both for community archives, such as Oxford’s Great War Archive, as well as for conventional Archive Services which are taking steps to open up their data to user input in some way – whether this be (for example) by enabling user comments on the catalogue, or establishing a wiki, or perhaps making digitised images available on flickr.

A simple, practical scenario to surface some of the issues:

An image posted to flickr with minimal description.  Two flickr users, one clearly a member of staff at the Archives concerned, have posted suggested identifications.  Since they both in fact offer the same name (“Britannia Mill”), it is not immediately clear whether they both refer to the same location, or whether the second comment contradicts the first.

Which comment (if either) correctly identifies the image?  Would you be inclined to trust an identification from a member of staff more readily than you’d accept “Arkwright”‘s comment?  If so, why? Clicking on “Arkwright”‘s profile, we learn that he is a pensioner who lives locally.  Does this alter your view of the relative trustworthiness of the two comments (for all we know, the member of staff might have moved into the area just last week)? How could you test the veracity of the comments?  Whose responsibility is this? If you feel it’s the responsibility of the Archive Service in question, what resources might be available for this work? If you worked for the Archive Service, would you feel happy to incorporate information derived from these comments into the organisation’s finding aids?  Bear in mind that any would-be user searching for images of “Britannia Mills” – wherever the location – would not find this image using the organisation’s standard online catalogue: is potentially unreliable information better than no information at all? What would you consider an ‘acceptable’ quality and/or quantity level for catalogue metadata for public presentation? You might think this photograph should never have been uploaded to flickr in its current state – but even this meagre level of description has been sufficient to start an interesting – potentially useful? – discussion.  Just as a relatively poor quality scan has been ‘good enough’ to enable public access outside of the repository, although it would certainly not suffice for print publication, for example.

Such ambivalence and uncertainty about accepting user contributions is one reason that The National Archives wiki Your Archives was initially designed “to be ‘complementary’ to the organisation’s existing material” rather than fully integrated into TNA’s website.

In our discussion on Friday, we identified four ways in which online archives might try to establish trust in user contributions:

  • User Profiles: enabling users to provide background information on their expertise.  The Polar Bear Expedition Archives at the University of Michigan have experimented with this approach for registered users of the site, with apparently ambiguous results.  Similar features are available on the Your Archives wiki, although similarly, few users appear to use them, except for staff of TNA.  Surfacing the organisational allegiance of staff is of course important, but would not inherently make their comments more trustworthy (as discussed above), unless more in-depth information about their qualifications and areas of expert knowledge is also provided.  A related debate about whether or not to allow anonymous comments, and the reliability of online anonymous contributions, extends well beyond the archival domain.
  • Shifting the burden of proof to the user: offering to make corrections to organisational finding aids upon receipt of appropriate documentation.  This is another technique pioneered on the Polar Bear Expedition Archives site, but might become burdonsome given a particularly active user community.
  • Providing user statistics and/or manipulating the presentation of user contributions on the basis of user statistics: i.e. giving more weight to contributions from users whose previous comments have proved to be reliable.  Such techniques are used on Wikipedia (users can earn enhanced editing rights by gaining the trust of other editors), and user information is available from Your Archives, although somewhat cumbersome to extract – in its current form, I think it is unlikely anybody would use this information to form reliability judgements.  This technique is sometimes also combined with…
  • Rating systems: these can be either organisation-defined ratings (as, for instance, the Brooklyn Museum Collection Online – I do not know of an archives example) or user-defined (the familiar Amazon or e-Bay ranking system -but, again, I can’t think of an instance where such a system has been implemented in an archives context, although often talked about – can you?). Flickr implements a similar principle, whereby registered users can ‘favourite’ images.

A quick scan of Google Scholar reveals much research into establishing trust in the online marketplace, and of trust-building in the digital environment as a customer relationship management activity.  But are these commercial models necessarily directly applicable to information exchange in the archives environment, where the issue at stake is not so much the customer’s trust in the organisation or project concerned (although this clearly has an impact on other forms of trust) so much as the veracity and reliability of the historical information presented?

Do you have any other suggestions for techniques which could be (or are) used to establish trust in online archives, or further good examples of the four techniques outlined in archival practice?  It strikes me that all four options above rely heavily upon human interpretation and judgement calls, therefore scalability will become an issue with very large datasets (particularly those held outside of an organisational website) which the Archives may want to manipulate machine-to-machine (see this recent blog post and comments from the Brooklyn Museum).

Read Full Post »

I hinted in the post below that there might be some changes coming up on this blog.  This is because, as some of you will already know, I have moved on from West Yorkshire Archive Service, to start a PhD jointly supervised by UCL’s Department of Information Studies and The National Archives provisionally entitled ‘We Think, Not I think: Harnessing collaborative creativity to archival practice; implications of user participation for archival theory and practice‘.

This means that my interests are expanding beyond the original focus of Around the World in Eighty Gigabytes, which I originally set up to document my own voyages of discovery about digital preservation and how international initiatives in this field might be scaled down to apply within the small archives settings with which I was most familiar.  I have umm-ed and ah-ed for a bit about what I should do now – start a new blog or morph this one to cover aspects of user participation?  In the end, I have decided to continue with 80GB.  There are various reasons for this:

  • There are several common strands between digital preservation research and my current interests in user collaboration – they both relate to the impact of digital technologies on archival theory and practice, and many of the major issues (eg authority, context, trust, the cultural challenges of embedding technological change in operational settings) are debated in both areas of research.  I had been thinking that these common themes would make for a good posting on Ada Lovelace day, but I didn’t, er, quite get round to it!
  • I haven’t stopped being interested in digital preservation, or in the impact of digital technology on smaller archives, and I will continue to post on both themes when opportunities arise.
  • I want a space to express my own personal opinions on things which interest me and to explore ideas.  What I post here will not represent the views of The National Archives or UCL any more than my previous postings represented the official stance of West Yorkshire Archive Service.
  • I flatter myself to think there are a few people who read my ramblings, and know me as 80GB.  If they are interested in digital preservation and small archives, and are into following obscure blogs, I suspect they may be interested in reading about the implications of social media on archives too.
  • Putting everything together should mean that I actually update the blog rather more regularly.
  • To be blunt, there are a few events coming up that I think I will want to write about, and I can’t be bothered to set up a new blog…

However, if either of my current readers thinks that this is a really bad idea, they should please let me know in the comments…

Read Full Post »

Finally getting around to posting a little something about the web archiving conference held at the British Library a couple of weeks ago.

From a local archives perspective, it was particularly interesting to hear a number of presenters acknowledge the complexity and cost of implementation and use of currently available web archiving tools.  Richard Davis, talking about the ArchivePress blog archiving project, went so far as to argue that this was using a ‘hammer to crack a nut’, and we’ll certainly be keeping an eye out at West Yorkshire Archive Service for potential new use cases for ArchivePress’s feed-focused methodology and tools.   ArchivePress should really appeal to my fellow local authority archivist colleague Alan who is always on the look-out for self-sufficiency in digital preservation solutions.

I also noted Jeffrey van der Hoeven’s suggestion that smaller archives might in future be able to benefit from the online GRATE (Global Remote Access to Emulation Services) tool developed as part of the Planets project, offering emulation over the internet through a browser without the need to install any software locally.

Permission to harvest websites, particularly in the absence of updated legal deposit legislation in the UK, was another theme which kept cropping up throughout the day.  So here is a good immediate opportunity for local archivists to get involved in suggesting sites for the UK Web Archive, making the most of our local network of contacts.  Although I still think there is a gap here in the European web archiving community for an Archive-It type service to enable local archivists to scope and run their own crawls to capture at-risk sites at sometimes very short notice, as we had to at West Yorkshire Archive Service with the MLA Yorkshire website.

Archivists do not (or should not) see websites in isolation – they are usually one part of a much wider organisational archival legacy.  To my mind, the ‘web archiving’ community is at present too heavily influenced by a library model and mindset, which concentrates on thematic content and pays too little attention to more archival concerns, such as provenance and context.  So I was pleased to see this picked up in the posting and comments on Jonathan Clark’s blog about the Enduring Links event.

Lastly in my round-up, Cathy Smith from TNA had some interesting points to make from a user perspective.  She suggested that although users might prefer a single view of a national web collection, this did not necessarily imply a single repository – although collecting institutions still need to work together to eliminate overlap and to coordinate presentation.  This – and the following paper on TNA’s Digital Continuity project – set me thinking, not for the first time, about some potential problems with the geographically defined collecting remits of UK local authority archive services in a digital world.  After all, to the user, local and central government websites are indistinguishable at the .gov.uk domain level, not to mention that much central government policy succeeds or fails depending on how it is delivered at local level.  Follow almost any route through DirectGov and you will end up at a search page for local services.  Websites, unlike paper filing series, do not have distinct, defined limits.  One of the problems with the digital preservation self-sufficiency argument is that the very nature of the digital world – and increasingly so in an era of mash-ups and personalised content – is the exact opposite, highly interdependent and complex.  So TNA’s harvesting of central government websites may be of limited value over the long-term, unless it is accompanied by an equally enthusiastic campaign to capture content across local government in the UK.

Slides from all the presentations are available on the DPC website.

Read Full Post »

Older Posts »