Feeds:
Posts
Comments

Posts Tagged ‘participation’

Day 1 Proper of the conference began with acknowledgements to the organisers, some kind of raffle draw and then a plenary address by an American radio journalist.  Altogether this conference has a celebratory feel to it – fitting since this is SAA’s 75th Anniversary year, but very different in tone from the UK conferences where the opening keynote speaker tends to be some archival luminary.  More on the American archival cultural experience later.

My session with Kate Theimer (of ArchivesNext fame) and Dr Elizabeth Yakel from the University of Michigan (probably best known amongst tech savvy UK practitioners for her work on the Polar Bear Expedition Finding Aid) followed immediately afterwards, and seemed to go well.  The session title was: “What Happens After ‘Here Comes Everybody’: An Examination of Participatory Archives”.  Kate proposed a new definition for Participatory Archives, distinguishing between participation and engagement (outreach); Beth spoke about credibility and trust, and my contribution was primarily concerned with contributors’ motivations to participate.  A couple of people, Lori Satter and Mimi Dionne have already blogged about the session (did I really say that?!), and here are my slides:

After lunch, I indulged in a little session-hopping, beginning in session 204 hearing about Jean Dryden’s copyright survey of American institutions, which asked whether copyright limits access to archives by restricting digitisation activity.  Dryden found that American archivists tended to take a very conservative approach to copyright expiry terms and obtaining third party permission for use, even though many interviewees felt that it would be good to take a bolder line.   Also, some archivists knowledge of the American copyright law was shaky – sounds familiar!  It would be interesting to see how UK attitudes would compare; I suspect results would be similar, however, I also wonder how easy it is in practical terms to suddenly start taking more of a risk-management approach to copyright after many years of insisting upon strict copyright compliance.

Next I switched to session 207, The Future is Now: New Tools to Address Archival Challenges, hearing Maria Esteva speak about some interesting collaborative work between the Texas Advanced Computing Center and NARA on visual finding aids, similar to the Australian Visible Archive research project. At the Exhibit Hall later, I picked up some leaflets about other NARA Applied Research projects and tools for file format conversion, data mining and record type identification which were discussed by other speakers in this session.

Continuing the digitization theme, although with a much more philosophical focus, Joan Schwartz in session 210, Genuine Encounter, Authentic Relationships: Archival Convenant & Professional Self-Understanding discussed the loss of materiality and context resulting from the digitisation of photographs (for example, a thumbnail image presented out of its album context).  She commented that archivists are often too caught up with the ‘how’ of digitization rather than the ‘why’.  I wouldn’t disagree with that.

Back to the American archival cultural experience, I was invited to the Michigan University ‘alumni mixer’ in the evening – a drinks reception with some short speeches updating alumni on staff news and recent developments in the archival education courses at the university.  All in all, archives students are much in evidence here: there are special student ‘ribbons’ to attach to name badges, many students are presenting posters on their work, and there is a special careers area where face-to-face advice is available from more senior members of SAA, current job postings are advertised, and new members of the profession can even pin up their curriculum vitae.  Some of this (the public posting of CVs in particular) might seem a bit pushy for UK tastes, and the one year length of UK Masters programmes (and the timing of Conference) of course precludes the presentation of student dissertation work.  But the general atmosphere seems very supportive of new entrants to the profession, and I feel there are ideas here that ARA’s New Professionals section might like to consider for future ARA Conferences.

Advertisements

Read Full Post »

This post is a thank you to my followers on Twitter, for pointing me towards many of the examples given below.  The thoughts on automated description and transcription are a preliminary sketching out of ideas (which, I suppose, is a way of excusing myself if I am not coherent!), on which I would particularly welcome comments or further suggestions:

A week or so before Easter, I was reading a paper about the classification of galaxies on the astronomical crowdsourcing website, Galaxy Zoo.  The authors use a statistical (Bayesian) analysis to distil an accurate sample of data, and then compare the reliability of this crowdsourced sample to classifications produced by expert astronomers.  The article also refers to the use of sample data in training artificial neural networks in order to automate the galaxy classification process.

This set me thinking about archivists’ approaches to online user participation and the harnessing of computing power to solve problems in archival description.  On the whole, I would say that archivists (and our partners on ‘digital archives’ kinds of projects) have been rather hamstrung by a restrictive ‘human-scale’, qualitatively-evaluated, vision of what might be achievable through the application of computing technology to such issues.

True, the notion of an Archival Commons evokes a network-oriented archival environment.  But although the proponents of this concept recognise “that the volume of records simply does not allow for extensive contextualization by archivists to the extent that has been practiced in the past”, the types of ‘functionalities’ envisaged to comprise this interactive descriptive framework still mirror conventional techniques of description in that they rely upon the human ability to interpret context and content in order to make contributions imbued with “cultural meaning”.  There are occasional hints of the potential for more extensible (?web scale) methods of description, in the contexts of tagging and of information visualization, but these seem to be conceived more as opportunities for “mining the communal provenance” of aggregated metadata – so creating additional folksonomic structures alongside traditional finding aids.  Which is not to say that the Archival Commons is not still justified from a cultural or societal perspective, but that the “volume of records” cataloguing backlog issue will require a solution which moves beyond merely adding to the pool of potential participants enabled to contribute narrative descriptive content and establish contextual linkages.

Meanwhile, double-keying, checking and data standardisation procedures in family history indexing have come a long way since the debacle over the 1901 census transcription. But double-keying for a commercial partner also signals a doubling of transcription costs, possibly without a corresponding increase in transcription accuracy.  Or, as the Galaxy Zoo article puts it, “the overall agreement between users does not necessarily mean improvement as people can agree on a wrong classification”.  Nevertheless, these norms from the commercial world have somehow transferred themselves as the ‘gold standard’ into archival crowdsourcing transcription projects, in spite of the proofreading overhead (bounded by the capacity of the individual, again).  As far as I am aware, Old Weather (which is, of course, a Zooniverse cousin of Galaxy Zoo) is the only project working with archival content which has implemented a quantitative approach to assess transcription accuracy – improving the project’s completion rate in the process, since the decision could be taken to reduce the number of independent transcriptions required from five to three.

Pondering these and other such tangles, I began to wonder whether there have indeed been any genuine attempts to harness large-scale processing power for archival description or transcription.  Tools are now available commercially designed to decipher modern handwriting (two examples: MyScript for LiveScribe; Evernote‘s text recognition tool), why not an automated palaeographical tool?  Vaguely remembering that The National Archives had once been experimenting with text mining for both cataloguing and sensitivity classification [I do not know what happened to this project – can anyone shed some light on this?], and recollecting the determination of one customer at West Yorkshire Archive Service who tried (and failed) valiantly to teach his Optical Character Recognition (OCR) software to recognise nearly four centuries of clerk’s handwriting in the West Riding Registry of Deeds indexes, I put out a tentative plea on Twitter for further examples of archival automation.  The following examples are the pick of the amazing set of responses I received:

  • The Muninn Project aims to extract and classify written data about the First World War from digitized documents using raw computing power alone.  The project appears to be at an early stage, and is beginning with structured documents (those written onto pre-printed forms) but hopes to move into more challenging territory with semi-structured formats at a later stage.
  • The Dutch Monk Project (not to be confused with the American project of the same name, which facilitates text mining in full-text digital library collections!) seeks to make use of the qualitative interventions of participants playing an online transcription correction game in order to train OCR software for improved handwriting recognition rates in future.  The project tries to stimulate user participation through competition and rewards, following the example of Google Image Labeller.  If your Dutch is good, Christian van der Ven’s blog has an interesting critique of this project (Google’s attempt at translation into English is a bit iffy, but you can still get the gist).
  • Impact is a European funded project which takes a similar approach to the Monk project, but has focused upon improving automated text recognition with early printed books.  The project has produced numerous tools to improve both OCR image recognition and lexical information retrieval, and a web-based collaborative correction platform for accuracy verification by volunteers.  The input from these volunteers can then in turn be used to further refine the automated character recognition (see the videos on the project’s YouTube channel for some useful introductory materials).  Presumably these techniques could be further adapted to help with handwriting recognition, perhaps beginning with the more stylised court hands, such as Chancery hand.  The division of the quality control checks into separate character, word, and page length tasks (as illustrated in this video) is especially interesting, although I think I’d want to take this further and partition the labour on each of the different tasks as well, rather than expecting one individual to work sequentially through each step.  Thinking of myself as a potential volunteer checker, I think I’d be likely to get bored and give up at the letter-checking stage.  Perhaps this rather more mundane task would be more effectively offered in return for peppercorn payment as a ‘human intelligence task’ on a platform such as Amazon Mechanical Turk, whilst the volunteer time could be more effectively utilised on the more interesting word and page level checking.
  • Genealogists are always ahead of the game!  The Family History Technology Workshop held annually at Brigham Young University usually includes at least one session on handwriting recognition and/or data extraction from digitized documents.  I’ve yet to explore these papers in detail, but there looks to be masses to read up on here.
  • Wot no catalogue? Google-style text search within historic manuscripts?  The Center for Intelligent Information Retrieval (University of Massachusetts Amherst) handwriting retrieval demonstration systems – manuscript document retrieval on the fly.
  • Several other tools and projects which might be of interest are listed in this handy google doc on Transcribing Handwritten Documents put together by attendees at the DHapi workshop held at the Maryland Institute for Technology in the Humanities earlier this year.  Where I’ve not mentioned specific examples directly here its mostly because these are examples of online user transcription interfaces (which for the purposes of this post I’m classing as technology-enhanced projects, as opposed to technology-driven, which is my main focus here – if that makes sense? Monk and Impact creep in above because they combine both approaches).

If you know of other examples, please leave a comment…

Read Full Post »

Today I have a guest post about my research on UKOLN‘s Cultural Heritage Blog.

Read Full Post »

A round-up of a few pieces of digital goodness to cheer up a damp and dark start to October:

What looks like a bumper new issue of the Journal of the Society of Archivists (shouldn’t it be getting a new name?) is published today.  It has an oral history theme, but actually it was the two articles that don’t fit the theme which caught my eye for this blog.  Firstly, Viv Cothey’s final report on the Digital Curation project, GAip and SCAT, at Gloucestershire Archives, with which I had a minor involvement as part of the steering group for the Sociey of Archivists’-funded part of the work.  The demonstration software developed by the project is now available for download via the project website.  Secondly, Candida Fenton’s dissertation research on the Use of Controlled Vocabulary and Thesauri in UK Online Finding Aids will be of  interest to my colleages in the UKAD network.  The issue also carries a review, by Alan Bell, of Philip Bantin’s book Understanding Data and Information Systems for Recordkeeping, which I’ve also found a helpful way in to some of the more technical electronic records issues.  If you do not have access via the authentication delights of Shibboleth, no doubt the paper copies will be plopping through ARA members’ letterboxes shortly.

Last night, by way of supporting the UCL home team (read: total failure to achieve self-imposed writing targets), I had my first go at transcribing a page of Jeremy Bentham’s scrawled notes on Transcribe Bentham.  I found it surprisingly difficult, even on the ‘easy’ pages!  Admittedly, my paleographical skills are probably a bit rusty, and Bentham’s handwriting and neatness leave a little to be desired – he seems to have been a man in a hurry – but what I found most tricky was not being able to glance at the page as a whole and get the gist of the sentence ahead at the same time as attempting to decipher particular words.  In particular, not being able to search down the whole page looking for similar letter shapes.  The navigation tools do allow you to pan and scroll, and zoom in and out, but when you’ve got the editing page up on the screen as well as the document, you’re a bit squished for space.  Perhaps it would be easier if I had a larger monitor.  Anyway, it struck me that this type of transcription task is definitely a challenge, for people who want to get their teeth into something, not the type of thing you might dip in and out of in a spare moment (like indicommons on iPhone and iPad, for instance).

I’m interested in reward and recognition systems at the moment, and how crowdsourcing projects seek to motivate participants to contribute.  Actually, it’s surprising how many projects seem not to think about this at all – the build it and wait for them to come attitude.  Quite often, it seems, the result is that ‘they’ don’t come, so it’s interesting to see Transcribe Bentham experiment with a number of tricks for monitoring progress and encouraging people to keep on transcribing.  So, there’s the Benthamometer for checking on overall progress, you can set up a watchlist to keep an eye on pages you’ve contributed to, individual registered contributors can set up a user profile to state their credentials, chat to fellow transcribers on the discussion forum, and there’s a points system, depending on how active you are on the site, and a leader board of top transcribers.  The leader board seems to be fueling a bit of healthy transatlantic competition right at the moment, but given the ‘expert’ wanting-to-crack-a-puzzle nature of the task here, I wonder whether the more social / community-building facilities might prove more effective over the longer term than the quantitative approaches.  One to watch.

Finally, anyone with the techie skills to mashup data ought to be welcoming The National Archives’ work on designing the Open Government Licence (OGL) for public sector information in the U.K.  I haven’t (got the technical skills) but I’m welcoming it anyway in case anyone who has hasn’t yet seen the publicity about it, and because I am keen to be associated with angels.

Read Full Post »

Since it seems a few people read my post about day one of ECDL2010, I guess I’d better continue with day two!

Liina Munari’s keynote about digital libraries from the European Commission’s perspective provided delegates with an early morning shower of acronymns.  Amongst the funder-speak, however, there were a number of proposals from the forthcoming FP7 Call 6 funding round which are interesting from an archives and records perspective, including projects investigating cloud storage and the preservation of context, and on appraisal and selection using the ‘wisdom of crowds’. Also, the ‘Digital Single Market’ will include work on copyright, specifically the orphan works problem, which promises to be useful to the archives sector – Liina pointed out that the total size of the European Public Domain is smaller than the US equivalent because of the extended period of copyright protection available to works whose current copyright owners are unknown. But I do wish people would not use the ‘black hole’ description; its alarmist and inaccurate.  If we combine this twentieth century black hole (digitised orphan works) with the oft-quoted born-digital black hole, it seems a wonder we have any cultural heritage left in Europe at all.

After the opening keynote, I attended the stream on the Social Web/Web 2.0, where we were treated to three excellent papers on privacy-aware folksonomies, seamless web editing, and the automatic classification of social tags. The seamless web editor, seaweed, is of interest to me in a personal capacity, because of its WordPress plugin, which would essentially enable the user to add new posts or edit existing ones directly into a web browser without recourse to the cumbersome WordPress dashboard, and absent mindedly adding new pages instead of new posts (which is what I generally manage to do by mistake). I’m sure there are archives applications too, possibly for instance in terms of the user interface design for encouraging participation in archival description.  Privacy-aware folksonomies, a system to enable greater user control over tagging (with levels user only, friends, and tag provider), might have application in respect of some of the more sensitive archive content, such as mental health records perhaps.  The paper on the automatic classification of social tags will be of particular interest to records managers interested in the searchability and re-usability of folksonomies in record-keeping systems, as well as to archivists implementing tagging systems into the online catalogue or digital archives interfaces.

After lunch we had a poster and demo session.  Those which particularly caught my attention included a poster from the University of Oregon entitled ‘Creating a Flexible Preservation Infrastructure for Electronic Records’ and described as the ‘do-it’ solution to digital preservation in a small repository without any money.  Sounded familiar!  The authors, digital library expert Karen Estlund and University Archivist Heather Briston, described how they have made best use of existing infrastructure, such as share drives (for deposit) and the software package Archivists Toolkit for description.  Their approach is similar to the workflow I put in place for West Yorkshire Archive Service, except that the University are fortunate to be in a position to train staff to carry out some self-appraisal before deposit, which simplifies the process.  I was also interested (as someone who is never really sure why tagging is useful) in a poster ‘Exploring the Influence of Tagging Motivation on Tagging Behaviour’ which classified taggers into two groups, describers and categorisers, and in the demonstration of the OCRopodium project at King’s College London, exploring the use of optical character recognition (OCR) with typescript texts.

In the final session of the day, I was assigned to the stream on search in digital libraries, where papers explored the impact of the search interface on search tasks, relevance judgements, and search interface design.

Then there was the conference dinner…

Read Full Post »

I had a day at the Society of Archivists’ Conference 2010 in Manchester last Thursday; rather a mixed bag. I wasn’t there in time for the first couple of papers, but caught the main strand on digital preservation after the coffee break. It’s really good to see digital preservation issues get such a prominent billing (especially as I understand there few sessions on digital preservation at the much larger Society of American Archivists’ Conference this year), although I was slightly disappointed that the papers were essentially show and tell rehearsals of how various organisations are tackling the digital challenge. I have given exactly this type of presentation at the Society’s Digital Preservation Roadshows and at various other beginners/introductory digital preservation events over the past year.  Sometimes of course this is precisely what is needed to get the nervous to engage with the practical realities of digital preservation, but all the same, it’s a pity that one or more of the papers at the main UK professional conference of the year did not develop the theme a little more and stimulate some discussion on the wider implications of digital archives.  However, it was interesting to see how the speakers assumed familiarity with OAIS and digital preservation concepts such as emulation. I suspect some of the audience were left rather bewildered by this, but the fact that speakers at an archives conference feel they can make such assumptions about audience understanding does at least suggest that some awareness of digital preservation theory and frameworks is at last crawling into the professional mainstream.

I was interested in Meena Gautam’s description of the National Archives of India‘s preparations for receiving digital content, which included a strategy for recruiting staff with relevant expertise. Given India’s riches in terms of qualified IT professionals, I would have expected a large pool of skilled people from which to recruit. But the direction of her talk seemed to suggest that, in actual fact, NAI is finding it difficult to attract the experts they require. [There was one particular comment – that the NAI considers conversion to microfilm to be the current best solution for preserving born-digital content – which seemed particularly extraordinary, although I have since discovered the website of the Indian National Digital Preservation Programme, which does suggest that the Indian Government is thinking beyond this analogue paradigm.]  Anyway, NAI are not alone in encountering difficulties in attracting technically skilled staff to work in the archives sector.  I assume that the reason for this is principally economic, in that people with IT qualifications can earn considerably more working in the private sector.

It was a shame that there was not an opportunity for questions at the end of the session, as I would have liked to ask Dr Gautam how archives could or should try to motivate computer scientists and technicians to work in the area of digital preservation.  Later in the same session, Sharon McMeekin from the Royal Commission on the Ancient and Historical Monuments of Scotland advocated that archives organisations should collaborate to build digital repositories, and I and several others amongst the Conference twitter audience agreed.  But from observation of the real archives world, I would suggest that, although most people agree in principle that collaboration is the way forward, there is very little evidence – as yet at least – of partnership in practice. I wonder just how likely it is that joint repositories will emerge in this era of recession and budget cuts (which might be when we need collaboration most, but when in reality most organisations’ operations become increasingly internally focused).  Since it seems archives are unable to compete in attracting skilled staff in the open market, and – for a variety of reasons – it seems that the establishment of joint digital repositories is hindered by traditional organisational boundaries, I pondered whether a potential solution to both issues might lie in Yochai Benkler‘s third organisational form of commons-based peer-production: as the means both to motivate a community of appropriately skilled experts to contribute their knowledge to the archives sector, and to build sustainable digital archives repositories in common.  There are already of course examples of open source development in the digital archives world (Archivematica is a good example, and many other tools, such as the National Archives of Australia’s Xena and The (UK) National Archives DROID are available under open source licences), since the use of open standards fits well with the preservation objective.  Could the archives profession build on these individual beginnings in order to stimulate or become the wider peer community needed to underpin sustainable digital preservation?

After lunch, we heard from Dr Elizabeth Shepherd and Dr Andrew Flinn on the work of the ICARUS research group at UCL’s Department of Information Studies, of which my user participation research is a small part.  It was good to see the the twitter discussion really pick up during the paper, and a good question and answer session afterwards.  Sarah Wickham has a good summary of this presentation.

Finally, at the end of the day, I helped out with the session to raise awareness of the UK Archives Discovery Network, and to gather input from the profession of how they would like UKAD to develop.  We asked for comments on post-it notes on a series of ‘impertinent questions‘.  I was particularly interested in the outcome of the question based upon UKAD’s Objective 4: In reality, there will always be backlogs of uncatalogued archives.” Are volunteers the answer?  From the responses we gathererd, there does appear to be increasing professional acceptance of the use of volunteers in description activities, although I suspect our use of the word ‘volunteer’ may be holding back appreciation of an important difference between the role of ‘expert’ volunteers in archives and user participation by the crowd.

Read Full Post »

Older Posts »