Summary of the RLUK Hack Day by The European Library
Forward-thinking developers passed through the traditional space of the Senate Library’s beautiful Special Collections area to participate in RLUK’s first Linked Open Data (LOD) Hackathon in London in May. In the lead-up to the event, The European Library published a series of LOD case studies. Together, the Hackathon and supporting case studies open up insights into early adoption of LOD.
“Is LOD really the next big thing for libraries? The jury is still out”, said RLUK (Research Libraries UK) Director Mike Mertens, as he introduced the event. He underpinned the exploratory nature of the day with a number of key objectives, notably the exploitation of rich bibliographic data in new ways in order to promote new skills in libraries, open up data for nimble library, solutions, and promote RLUK’s rich open metadata.
Louise Edwards, Director of The European Library, saw the event as an important practical application of the dataset (which draws in contributions from RLUK and other nationally-representative organisations). “We’re aggregating this data to foster research across Europe”, she said. “To date, the dataset comprises 200 million bibliographic records, 57 million of which are made available through an API which will soon be able to handle LOD. Our overriding priority is openness.” Both RLUK and The European Library elicited feedback on the data and access mechanisms provided.
Introducing the hackers
The event attracted a range of hackers. Michael Jones and Glen Robson at the National Library of Wales planned to link a World War One diary to digitised newspapers and other datasets. Owen Stephens, a library consultant who also facilitated the event, aimed to develop a Chrome extension that matched people on The European Library website with individuals who share their date of birth. The Hackathon gave Sara Wingate Gray and Kate Lomax, from artefacto the chance to develop an interface where users could explore LOD at their chosen level of complexity. Berrisford Edwards from the University of Manchester was interested in visualising the statistical content of the dataset.
For some, the motivation was simply to learn about LOD, keying into RLUK’s objective of encouraging skills diversification. Henry Morgan, an LIS student at UCL, was planning a dissertation on LOD. Shannon Searle from Cranfield University wanted to explore the German and Russian experiences of World War One as a learning exercise. Frank Herrmann, a Physics researcher from the University of Maryland, was there to find out more about data structures.
What key themes emerged from the LOD hackathon and case studies?
Creating meaning and stories out of information
“Creating stories and meaning out of information is an interesting proposition”, said Mike Mertens at the start of the Hackathon. The semantic approach, with its structured flow of data, creates new possibilities for story-telling and context-building.
National Library of Wales hackers presented an engaging story from the outer reaches of World War One. They had discovered the journal of British diplomat Lewis Epstein, who was in Constantinople at the time of the Dardanelles campaign. In Welsh Newspapers Online, they found one article on the Dardanelles and one on Epstein. They found relevant entries on VIAF and Wikipedia as well as a digitised edition of the journal in Google Books. They created links to the RLUK triplestore and to the newspaper collection, and also built a prototype which used crowdsourced the validation of links.
Kings College London has created an LOD vocabulary of military battles from its archive catalogues. “It’s no longer about filling out fields; it’s about expressing and coding relationships”, said Geoff Browell. “Archivists can play a story-telling role, drawing on contextual knowledge such as the family who deposited the collection.”
Reaching out to researchers and end users
Focus on end-user accessibility is intensifying across the LOD community. The British Library was one of two case studies that emphasised the selection of data that has broad appeal for reuse.
artefacto, by extracting data related to the incongruous subjects of pirates and cats, cleverly took the focus away from the content to explore imaginative ways of visualising data, including a timeline of pirate and cat publications, which could be discovered separately or together.
National Library of Spain has developed a public-facing portal, which the team made as accessible as possible by consulting a usability expert and benchmarking other LOD solutions for navigability. The portal will help developers to handle the data as well as transform public access to the data.
Exploring the value of external tools and datasets
The British Library speaks for all the case study libraries in its conviction that drawing on tools and experiences available elsewhere is often preferable to home-grown solutions.
National Library of France has delivered a portal which applies widely-adopted vocabularies such as skos, foaf and Dublin Core, and integrates data elements from schema.org, and from the Open Graph protocol to represent its pages in social networks.
National Library of the Netherlands has sourced author year of data to make more accurate copyright judgements. However this case study warned of the incompleteness of well-known external datasets.
Staff Development
For Mike Mertens, encouraging skills diversification was an important objective, and this is vindicated in the number of case studies that emphasised staff development.
British Library identified staff development as a significant benefit, despite the steep learning curve. Neil Wilson urged libraries to provide training and “cultivate a culture of enquiry and innovation to broaden perspectives on new possibilities”.
Kings College London believes that LOD will take staff to new places as they pool their deep understanding and become co-producers of content. Geoff Browell did, however, identify the need to provide non-technical staff with intuitive tools.
Retro-fitting traditional library data formats to LOD
The Hackathon was only possible because of the successful conversion of catalogue data to LOD, but almost every case study pointed to the challenges involved. Typical of these was the British Library, which described its legacy data as “problematic” and a “constraint on data modelling”.
Cambridge University Library has delivered an open-source tool to convert MARC21 to RDF triples, and has also added substantial enrichments to its catalogue records, despite “limitations” in the conversion of MARC and AACR2 formats.
German National Library has retained its existing infrastructure, which meant that MARC21 record changes impacted the LOD service until the Library moved to a common release cycle for all data formats.
At the end of a hard day’s hacking…
Over drinks and nibbles, after a long day in which hackers had demonstrated their focus and commitment to the event , Owen Stephens, who had been commissioned by RLUK to moderate the day, announced the winners in three categories.
National Library of Wales was the well-deserved winner of the Best Overall Hack. Theirs was a solution that was well-planned (they joked that the long train journey from Aberystwyth had afforded this), surfacing a neglected fragment of history that was bibliographic but also rich in context, and highly linkable to external datasets.
artefacto won the Best Presentation award, taking participants through their cats and pirates interface, using humour and striking visuals to deliver a serious demonstrator of user accessibility.
Berrisford Edwards from the University of Manchester won the prize for “Best Value to RLUK”. He meticulously documented his issues and presented recommendations around the dataset and API.
This was the very reason why RLUK had organised the Hackathon, to elicit feedback and gain a better understanding of developer needs and aspirations. The event also provided Nuno Freire, developer at The European Library, with specific ideas and recommendations for making further improvements to data access. Together, hackers worked with data providers to meet the event’s objectives of engaging the community, strengthening expertise, generating ideas and using LOD “in anger”, to test its real-world applicability, and its potential to create library solutions.
“By publishing their bibliographic information and authority files as linked data under an open licence, libraries significantly lower the barriers to data reuse, and can also benefit from third-party sources. The potential of this movement cannot be underestimated. ” Ulrike Junger, Head, Division of Cataloguing, German National Library (DNB).
Republished with permission of The European Library