Feeds:
Posts
Comments

Archive for September, 2010

Day 3 of ECDL started for me with the Query Log Analysis session.  I thought perhaps that, now the papers were getting heavily into IR technicalities, I might not understand what was being presented or that it would be less relevant to archives.  How wrong can you be!  Well, ok, IR metrics are complex, especially for someone new to the field, but when the first presentation was based upon a usability study of the EAD finding aids at the Nationaal Archief (the National Archives of the Netherlands), it wasn’t too difficult to spot the relevance.  In fact, it was interesting to see how you notice things when the test data is presented in a foreign language, that you wouldn’t necessarily observe if they were in your mother tongue.  In the case of the Nationaal Archief, I was horrified at how many clicks were required to reach an item description.  Most archives have this problem with web-based finding aids (unless they merely replicate a traditional format, for instance, a PDF copy of a paper list), but somehow it was so much more obvious when I wasn’t quite sure exactly what was being presented to me at each stage of the results.  This is what it must be like to be an archival novice.  No wonder they give up.

The second paper of the morning, Determining Time of Queries for Re-ranking Search Results, was also very pertinent to searching in an archival context.  It discussed ‘temporal documents’ where either the terminology itself has changed over time or time is highly relevant to the query.  This temporal intent may be either implicit or explicit in the query.  For example, ‘tsunami + Thailand’ is likely to refer to the 2004 tsunami.  These kinds of issues are obviously very important for historians, and for archivists making temporal collections available in a web environment, such as web archives and online archival finding aids.

Later in the morning, I was down to attend the stream on Domain-specific Digital Libraries.  One of these specific domains turned out to be archives, with an (appropriately) very philosophical paper presented by Pierre-Edouard Portier about DINAH [in French].  This is “a philological platform for the construction of multi-structured documents”, created to enable the transcription and annotation of the papes of the French philosopher, Jean-Toussaint Desanti, and to facilitate the visualization of the trace of user activities.  My tweeting of this paper (limited on account of both the presentation’s intellectual and technical complexity and the fact that I’d got to bed at around 3am that morning!) seemed to catch the attention of both the archival profession and the Linked Data community;  it certainly deserves some further coverage in the English-speaking archival professional literature.

In the same session, I was also interested in the visualization techniques presented for time-oriented scientific data by Jürgen Bernard, which reminded me of The Visible Archive research project funded by the National Archives of Australia.  The principle – that visual presentations are a useful, possibly preferable, alternative to text-based descriptions of huge series of data – is the same in both cases.  Similarly, the PROBADO project has investigated the development of tools to store and retrieve complex, non-textual data and objects, such as 3D CAD drawings and music.  There were important implications from all of these papers for the future development of archival finding aids.

In the afternoon, I found myself helping out at the Networked Knowledge Organization Systems/Services (NKOS) workshop.  I wasn’t really sure what this entailed, but it turned out to involve things like thesauri construction and semantic mapping between systems, all of which is very relevant to the UK Archives Discovery (UKAD) Network objectives.  I was particularly sorry I was unable to make the Friday session of the workshop, which was to be all about user-centred knowledge system design, and Linked Data, however the slides are all available with the programme for the workshop.

Once again, my sincere thanks to the conference organisers for my opportunity to participate in ECDL2010.  The conference proceedings are available from Springer, for those who want to follow up further, and presentation slides are gradually appearing on the conference website.

Read Full Post »

Since it seems a few people read my post about day one of ECDL2010, I guess I’d better continue with day two!

Liina Munari’s keynote about digital libraries from the European Commission’s perspective provided delegates with an early morning shower of acronymns.  Amongst the funder-speak, however, there were a number of proposals from the forthcoming FP7 Call 6 funding round which are interesting from an archives and records perspective, including projects investigating cloud storage and the preservation of context, and on appraisal and selection using the ‘wisdom of crowds’. Also, the ‘Digital Single Market’ will include work on copyright, specifically the orphan works problem, which promises to be useful to the archives sector – Liina pointed out that the total size of the European Public Domain is smaller than the US equivalent because of the extended period of copyright protection available to works whose current copyright owners are unknown. But I do wish people would not use the ‘black hole’ description; its alarmist and inaccurate.  If we combine this twentieth century black hole (digitised orphan works) with the oft-quoted born-digital black hole, it seems a wonder we have any cultural heritage left in Europe at all.

After the opening keynote, I attended the stream on the Social Web/Web 2.0, where we were treated to three excellent papers on privacy-aware folksonomies, seamless web editing, and the automatic classification of social tags. The seamless web editor, seaweed, is of interest to me in a personal capacity, because of its WordPress plugin, which would essentially enable the user to add new posts or edit existing ones directly into a web browser without recourse to the cumbersome WordPress dashboard, and absent mindedly adding new pages instead of new posts (which is what I generally manage to do by mistake). I’m sure there are archives applications too, possibly for instance in terms of the user interface design for encouraging participation in archival description.  Privacy-aware folksonomies, a system to enable greater user control over tagging (with levels user only, friends, and tag provider), might have application in respect of some of the more sensitive archive content, such as mental health records perhaps.  The paper on the automatic classification of social tags will be of particular interest to records managers interested in the searchability and re-usability of folksonomies in record-keeping systems, as well as to archivists implementing tagging systems into the online catalogue or digital archives interfaces.

After lunch we had a poster and demo session.  Those which particularly caught my attention included a poster from the University of Oregon entitled ‘Creating a Flexible Preservation Infrastructure for Electronic Records’ and described as the ‘do-it’ solution to digital preservation in a small repository without any money.  Sounded familiar!  The authors, digital library expert Karen Estlund and University Archivist Heather Briston, described how they have made best use of existing infrastructure, such as share drives (for deposit) and the software package Archivists Toolkit for description.  Their approach is similar to the workflow I put in place for West Yorkshire Archive Service, except that the University are fortunate to be in a position to train staff to carry out some self-appraisal before deposit, which simplifies the process.  I was also interested (as someone who is never really sure why tagging is useful) in a poster ‘Exploring the Influence of Tagging Motivation on Tagging Behaviour’ which classified taggers into two groups, describers and categorisers, and in the demonstration of the OCRopodium project at King’s College London, exploring the use of optical character recognition (OCR) with typescript texts.

In the final session of the day, I was assigned to the stream on search in digital libraries, where papers explored the impact of the search interface on search tasks, relevance judgements, and search interface design.

Then there was the conference dinner…

Read Full Post »

I am extremely lucky to have been offered a student place helping out at ECDL 2010, the European Conference on Research and Advanced Technology for Digital Libraries. The following are the highlights from day 1 of the conference for this archivist let loose in the virtual stacks:

Susan Dumais‘ keynote presented recent Microsoft research into the temporal dynamics of the web, analysing both changes to content and how people revisit web pages, checking for new content or looking for previously found information. She argued that the current generation of web browsers offer only a static, snapshot view, and went on to illustrate a browser plugin called DiffIE which highlights what has changed on a web page since the user’s last visit. She also presented some initial evaluation of this tool, which indicated that although perceptions of revisitation frequency remained constant, in practice users of the plugin increased their revisitation rate. There are lots of potential applications for this kind of tool for archives – from the presentation of web archives to the user interactions/annotations/ratings examples that Dumais herself gave. She also spoke about the implications of her research to the ranking and presentation of search results, illustrating how the pertinency and hence relevancy of certain terms can decline over time – for example, a user searching for ‘US Open’ this week is more likely to looking for information on the tennis grand slam than the golf tournament. Again, there are some interesting implications here for archival catalogue and document search systems.

Christos Papatheodorou from the Ionian University on Corfu spoke about the mapping of disparate cultural heritage (archives, museums, libraries) XML-based metadata schema to the CIDOC CRM ontology, and went on to describe the transformation of XPath queries submitted to a local (XML) data source into equivalent queries suitable to be submitted to other data sources, via the CIDOC CRM ontology. Having travelled up to Glasgow on the sleeper, arriving at 7 in the morning, I confess I got a bit lost in the technicalities from this point onwards, but the basic idea is to use CIDOC CRM as a mediator between disparate cultural heritage sources marked up in different XML schema. There was an extended worked example using EAD, which was nice to hear. In general, it has been interesting to observe a large number of papers at this conference which report experiments based upon data from cultural heritage rather than scientific domains. All of which tends to reinforce my thoughts after the Society of Archivists’ Conference about attracting technology experts to work in the archives sector: cultural heritage data is complex and thus, it seems, fascinating and intrinsically motivating to work with. We should be more proactive about promoting archival data to this kind of digital research community.

I’d been particularly looking forward to the paper on User-Contributed Metadata for Libraries and Cultural Institutions, although this turned out to be a Drexel University re-working of the Library of Congress flickr Commons experience, albeit concentrating more on user comments and less upon tagging. I was not quite comfortable anyway with the a priori categorization of comments described in the paper (into 1. personal and historical 2. link out (eg to wikipedia) 3. corrections and translations 4. link in (eg adding images to flickr groups) – seems to me that category 1. includes a particularly wide range of possible comment types), plus all the things I wanted to ask about seemed to be listed as ‘future research’. These include a fuller categorization, exploring motivations for adding comments, the presentation of comments in the user interface, and librarians’ (or archivists’) role in moderating user interaction.

I also enjoyed a couple of papers which presented ideas to do with improving information visualisation and user judgement using colours, layout and social navigation, all of which have some potential relevancy to the question of how best to present user-generated content.

Research and Advanced Technology for Digital Libraries, Proceedings of the 14th European Conference, ECDL 2010, Glasgow, UK, September 2010 is published as Lecture Notes in Computer Science 6273, available via SpringerLink, for those of you who have access.

And I have travelled twice on Glasgow’s baby underground train 🙂

Read Full Post »

I had a day at the Society of Archivists’ Conference 2010 in Manchester last Thursday; rather a mixed bag. I wasn’t there in time for the first couple of papers, but caught the main strand on digital preservation after the coffee break. It’s really good to see digital preservation issues get such a prominent billing (especially as I understand there few sessions on digital preservation at the much larger Society of American Archivists’ Conference this year), although I was slightly disappointed that the papers were essentially show and tell rehearsals of how various organisations are tackling the digital challenge. I have given exactly this type of presentation at the Society’s Digital Preservation Roadshows and at various other beginners/introductory digital preservation events over the past year.  Sometimes of course this is precisely what is needed to get the nervous to engage with the practical realities of digital preservation, but all the same, it’s a pity that one or more of the papers at the main UK professional conference of the year did not develop the theme a little more and stimulate some discussion on the wider implications of digital archives.  However, it was interesting to see how the speakers assumed familiarity with OAIS and digital preservation concepts such as emulation. I suspect some of the audience were left rather bewildered by this, but the fact that speakers at an archives conference feel they can make such assumptions about audience understanding does at least suggest that some awareness of digital preservation theory and frameworks is at last crawling into the professional mainstream.

I was interested in Meena Gautam’s description of the National Archives of India‘s preparations for receiving digital content, which included a strategy for recruiting staff with relevant expertise. Given India’s riches in terms of qualified IT professionals, I would have expected a large pool of skilled people from which to recruit. But the direction of her talk seemed to suggest that, in actual fact, NAI is finding it difficult to attract the experts they require. [There was one particular comment – that the NAI considers conversion to microfilm to be the current best solution for preserving born-digital content – which seemed particularly extraordinary, although I have since discovered the website of the Indian National Digital Preservation Programme, which does suggest that the Indian Government is thinking beyond this analogue paradigm.]  Anyway, NAI are not alone in encountering difficulties in attracting technically skilled staff to work in the archives sector.  I assume that the reason for this is principally economic, in that people with IT qualifications can earn considerably more working in the private sector.

It was a shame that there was not an opportunity for questions at the end of the session, as I would have liked to ask Dr Gautam how archives could or should try to motivate computer scientists and technicians to work in the area of digital preservation.  Later in the same session, Sharon McMeekin from the Royal Commission on the Ancient and Historical Monuments of Scotland advocated that archives organisations should collaborate to build digital repositories, and I and several others amongst the Conference twitter audience agreed.  But from observation of the real archives world, I would suggest that, although most people agree in principle that collaboration is the way forward, there is very little evidence – as yet at least – of partnership in practice. I wonder just how likely it is that joint repositories will emerge in this era of recession and budget cuts (which might be when we need collaboration most, but when in reality most organisations’ operations become increasingly internally focused).  Since it seems archives are unable to compete in attracting skilled staff in the open market, and – for a variety of reasons – it seems that the establishment of joint digital repositories is hindered by traditional organisational boundaries, I pondered whether a potential solution to both issues might lie in Yochai Benkler‘s third organisational form of commons-based peer-production: as the means both to motivate a community of appropriately skilled experts to contribute their knowledge to the archives sector, and to build sustainable digital archives repositories in common.  There are already of course examples of open source development in the digital archives world (Archivematica is a good example, and many other tools, such as the National Archives of Australia’s Xena and The (UK) National Archives DROID are available under open source licences), since the use of open standards fits well with the preservation objective.  Could the archives profession build on these individual beginnings in order to stimulate or become the wider peer community needed to underpin sustainable digital preservation?

After lunch, we heard from Dr Elizabeth Shepherd and Dr Andrew Flinn on the work of the ICARUS research group at UCL’s Department of Information Studies, of which my user participation research is a small part.  It was good to see the the twitter discussion really pick up during the paper, and a good question and answer session afterwards.  Sarah Wickham has a good summary of this presentation.

Finally, at the end of the day, I helped out with the session to raise awareness of the UK Archives Discovery Network, and to gather input from the profession of how they would like UKAD to develop.  We asked for comments on post-it notes on a series of ‘impertinent questions‘.  I was particularly interested in the outcome of the question based upon UKAD’s Objective 4: In reality, there will always be backlogs of uncatalogued archives.” Are volunteers the answer?  From the responses we gathererd, there does appear to be increasing professional acceptance of the use of volunteers in description activities, although I suspect our use of the word ‘volunteer’ may be holding back appreciation of an important difference between the role of ‘expert’ volunteers in archives and user participation by the crowd.

Read Full Post »