Earlier this year, I was given the opportunity to kick off the first meeting of the Universities UK Open Access Repositories Working Group with reflections on the UK open access repository landscape. This blog post builds on the presentation and is informed by discussions both within the group and with other colleagues from the UK, but it should not be read as a statement from the working group.

Even so it may be useful to give some context on the working group. In February 2016, the Department for Business, Innovation & Skills published an independent report on open access, written by Professor Adam Tickell. One of his recommendations was “that the British Library, Research Libraries UK and the Society of College, National and University Libraries (SCONUL) convene, with appropriate support, to advise as to the best mechanisms to ensure that there is at least one permanent copy of an open access publication and that due regard is given to long term curation of digital assets.” Established in February 2017, the Repositories Working Group is now looking at these and other issues in the repositories space. It acts as a forum to identify and discuss areas where common benefit can be delivered, and where action can be taken to realise the benefits for repositories and their users. The group has representatives from the HE sector, Jisc, RLUK, SCONUL, the Wellcome Trust, the publishing industry and the British Library. It is chaired by Anne Horn, with Sage providing the secretariat function.

The first observation on the UK repository landscape has to be how rich it is. It is hard to think of any higher education institution in the UK that does not have a repository function. A quick look at the numbers confirms this impression: OpenDOAR, the Directory of Open Access Repositories, lists 256 UK repositories, marking the UK as the country with the second largest number of repositories in the world, only behind the Unites States (499). The only other countries that even reach triple digit numbers are Japan (214) and Germany (202). What the numbers don’t show, and that is another exciting feature of the UK repository scene, is the level of collaboration and existence of shared services. The White Rose Consortium for example runs a shared repository for Leeds, Sheffield and York, and some organisations provide hosting for other universities, including Edinburgh and ULCC.

More important than the number of repository instances is the strength and vitality of the community. It has its own professional membership organisation, the United Kingdom Council of Research Repositories (UKCoRR), bringing together over 400 individual members. UKCoRR is driven by a spirit of collaboration, and it gives repository staff a forum to openly discuss problems, find solutions and coordinate across institutions. It’s this wealth of experience and skills that makes UKCoRR such a unique organisation, even though it has to rely on individual contributions for anything it does.

The UK also has strong institutions supporting open access repositories. One of them is my former employer Jisc, who over many years has invested in the community, systems and services. Early on, Jisc supported the development of repository software such as EPrints, and it funded projects that enhanced institutional services and helped the sector develop skills and capacity. Currently Jisc offers a range of OA repository services that are used across the world, for example the Sherpa services that facilitate compliance checking and embargo management, and a set of related repository plugins. Other services include the aforementioned OpenDoar, IRUS–UK (a statistics aggregation service for repositories) and the OA aggregator CORE.

Another UK institution that has long supported open access is the Wellcome Trust. While Wellcome currently focuses mostly on gold OA, they have still made a very significant contribution to the repository scene through their support for Europe PubMedCentral. Europe PMC, which the British Library helped set up, provides access to over 3.7 million biomedical research articles, making it the largest subject-repository in the world. The UK community benefits from being able to use other international repository services such as Zenodo and a growing family of subject repositories including arXiv, RePEc and new preprint services like bioaXiv. Equally, the UK benefits from – and often has a leading role – in the development of standards and protocols like SWORD and OAIPMH. OAIPMH is, for example, being used to power another unique part of the UK repository infrastructure: EThOS, the e-theses repository managed by the British Library, holds the full-text of some 160000 doctoral theses and points to another 300000 theses records across university repositories.

Leaving theses and subject repositories aside, none of the other achievements mean much if there isn’t enough content in institutional repositories to justify the effort in developing them. Over the last four years in particular we have seen a massive increase in content deposited to UK institutional repositories. Let’s take the example of Spiral, the repository of Imperial College. In 2012 about 300 manuscripts of scholarly articles were deposited in Spiral, a number that grew to eleven thousand deposits made in 2016. The dramatic increase at Imperial College and other universities partly goes back to efforts from many dedicated staff across the sector and to funding universities and funders made available for repository management. However, the key drivers are the strong funder mandates for open access, such as the OA policy for the post-2014 REF (the UK’s research assessment framework). This policy in particular has encouraged, or perhaps we should say forced, vice chancellors across the country to develop an active interest in repositories.

So, we have an institutional infrastructure, support services, a strong and skilled community, recognition of repositories as strategic systems and healthy deposit rates across the country – if everything is going so well, why the need for UUK to focus on repositories? Despite all that has been achieved there are still a host of problems that make the ecosystem less efficient and effective than it should be, and some risks for the future of open access that need to be addressed.

On the technical side, a major issue is with the repository systems. While most institutions have a repository, many use older versions of software that may no longer be supported, lack key features, prevent the integration of required modules and complicate the work of repository staff and authors through usability issues. In particular in the early stages of the institutional repository movement systems were set up by enthusiasts who overcame the (initial) lack of features by customising systems to a great extent. Several years later, with staff having retired or moved on, the skills to update to a newer platform may no longer be available within the institution. Equally, there is often a lack of funds and IT support that prevents regular updates and content migration. In particular in smaller institutions, individual repository staff are often required to simultaneously develop policy, provide reports, engage with authors, manage deposits and act as IT support. Even where one person has all these skills the workload is such that migrating to a new platform while maintaining business as usual is not feasible. As a result the institutional repository landscape is fragmented: institutions may only use a handful of repository systems but in so many versions with specific customisations that integrating new functionality across the sector can be a nightmare.

It should also be noted that there are some concerns about the architecture and therefore long-term viability of some of the repository platforms. Take the example of EPrints: it is a popular and easy to deploy platform, but it is based on Perl, a programming language that has been in a steady decline for over a decade now.

Leaving specific technologies aside, usability and user experience remains a problem. Even in the case of the latest open source packages user experience design may not always have been the first priority. In addition, skilled staff are required to design and implement good workflows locally. As a result some of the repository user interfaces can be clunky and look rather dated. For busy academics open access is sometimes just another administrative requirement, which makes simple, easy to understand and ideally pleasant workflows key for encouraging deposit.

Another issue in this context is the lack of integration of repositories with other institutional systems. Academics are frustrated by having to enter the same information over and over again, which costs them time and increases the chance of making errors. This can at least partially be overcome by systems integration, especially with current research information systems, HR, finance and grant management solutions. However, this requires using relatively up-to-date repository systems and enough skilled developer capacity for building integrations.

Such integrations are essential for making reporting more accurate and less time-consuming. As a result of our strong open mandates the workload for reporting has significantly increased. A big problem in this context is that we still struggle to identify whether a publication is available as open access and, due to a lack of metadata, whether it is compliant with a specific policy. The use of identifiers such as DOI, ORCID and FundRef can help, but in order for these solutions to deliver to their full potential we need uptake by all stakeholders and systems integrations for seamless exchange of data. Instead of getting into a long and depressing rant on open access reporting I can just say here’s one I prepared earlier and refer you to a series of blog posts I wrote about this while working at Imperial College.

Lack of sufficient metadata and identifiers also relates to issues around versioning. For an article we may now have a preprint, a green OA version in an institutional repository and the version of record. And another green version deposited by one of several co-authors in another institutional repository. And perhaps another, which may or may not be 100% identical to the first. Repository staff usually try to add the DOI for the version of record to a deposit, but it may not always be available in time. As the REF open access policy encourages or rather requires early deposit universities cannot reliably identify whether a co-author at another university has already deposited a manuscript (due to publisher embargoes that delay availability, but also through a lack of identifiers). To ensure compliance, research organisations have to ask their research staff to deposit everything, thereby duplicating work and potentially creating different version of the same article. The Jisc Publications Router has the potential to address these workflow issues, provided Jisc and the sector can encourage more publishers to push manuscripts into this routing mechanism (currently only a few publishers support it, and most of these make UK content available openly anyway, which makes them less relevant than subscription publishers in this context).

Inefficiency arguably does not just exist in the manual deposit process, but also in duplicating technical work and systems support across universities who host and maintain their own, locally installed repository. This links to an area that particularly concerns me from the perspective of the organisation that is tasked to preserve the UK’s scholarly and published output: the majority of institutional repositories lack digital preservation capability. Considering the volume of OA content and the different versions that may well be cited but aren’t preserved this is a concern. Jisc is currently running a pilot for a research data shared service that could be expanded to include open access content. However, the transition to service has not yet been approved and it is not clear whether all UK universities would subscribe to such a service. The British Library is harvesting the UK web domain for preservation purposes, and that includes open access repositories. However, the legal framework for non-print legal deposit restricts access to BL premises, and depending on the local repository configuration not all open access content may be included. So while we have elements of a solution, further effort is required to develop a stable, sustainable OA preservation service (a service that in my mind should not just be a dark archive but also allow access where required).

In this context it is probably fair to say that access is often more of an afterthought from a repository perspective. Repository teams are already stretched through business as usual, including managing the flow of deposits, keeping systems from falling over and encouraging authors to upload their manuscripts. This leaves little time to put effort into facilitating reuse, for example by making it easy to text and data mine from repositories. A particular problem here is licensing, or specifically the restrictions publisher policies put on repositories. Some journals and publishers allow deposit in a way that facilitates text mining, but we need to move to a situation that unambiguously allows full text and data mining of open access repository content across jurisdictions.

To sum up: a lot has been achieved in the UK (and internationally), but we still have a long way to go until we have an effective and efficient scholarly communications system (and I have not even gone into wider issues around gold open access, metrics, impact and peer review).

Our working group is preparing a set of recommendations for the UUK Open Access Coordination group. To ensure that these are informed by the experience on the ground we used the UKCoRR network to reach out to repository staff to identify and rank the issues they struggle with. I thought I share the list (as prioritised by staff from 47 UK universities) with you to set my comments in context:

  1. Reporting facilities not sufficient for internal reporting
  2. Technical support: not enough capacity
  3. Not enough staff resource for operational management
  4. Management effort for journal embargoes
  5. Reporting facilities not sufficient for funder reporting
  6. Usability and user interface issues
  7. Technical support: lack of skills / capability
  8. Lack of integration with identifiers (such as ORCID or DOIs)
  9. Tracking/integrating AAMs deposited in subject/other institutional repositories (REF OA policy)
  10. Difficulties with integration with university systems (other than CRIS)
  11. Difficulties with CRIS system integration
  12. No or limited preservation functionality
  13. Difficulties  with maintaining custom functionality
  14. Not enough resource to update from older/out-of date version of repository software
  15. Concerns about sustainability of the underlying repository software package
  16. Issues with changing publisher and/or funder policies changing compliance status of articles
  17. Linking publications to relevant funders
  18. Limitation of reuse through deposit licence (‘all rights reserved’), e.g. for text and data mining
  19. Linking publications to related datasets (and vice versa)
  20. Limited/no facilities (such as API) to support text and data mining

To me the main outcome of the consultation was having confirmation that we had not missed any major issues in our analysis. I would caution against putting too much emphasis on the ranking as such. First of all, many of these issues are interrelated to the extent that it can be hard to separate or prioritise them. Therefore the ranking is actually very close, with only the top and bottom three or four really standing out – and even so for almost every issue we had at least one vote ranking it at the top and one at the bottom. Generally this list emphasises issues that operationally cause most problems for repository teams. Other stakeholders may prioritise differently – for example I would assume funders like HEFCE would rate issues 17-20 higher, whereas Adam Tickell’s report emphasised 12 as a key concern from a national perspective.

More important than the exact ranking is therefore to identify the stakeholders that can make a difference in addressing problems and to consider what actions should be taken. This is what the working group is currently investigating.

Dr Torsten Reimer
Head of Research Services at the British Library