Three years ago, I wrote this list of potential research-support roles for a library in the digital environment:
- institutional repository for pre-prints and post-prints of the research organization's publications
- data repository for the research conducted at the organization
- providing advanced (data/publication/information/discovery/etc.) tools that integrate into the researcher's workflow
These are numbered for convenience, not importance.
What do I think, three years on.
Institutional Repositories
1. While institutional repositories are valuable, they currently benefit primarily organisations, not researchers. They provide a unified view of an organisation's published output. For individual researchers, their priority may be just on getting published, or if they do want to disseminate their work, they may just post it to their own website (and sad to say, may get more Google rank having it there than in their repository).
Because of this property, there is still a huge content recruitment challenge for IRs. I saw this at SPARC Digital Repositories 2008 where, to be blunt, the tone seemed to be mostly "we built it and they didn't come". And in fairness to individual organisations, even Wellcome with its billions and its mandatory policy isn't getting good compliance:
The Wellcome Trust have been monitoring compliance rates, and have been disappointed to find that these are currently very low. As a result of this, they intend to more actively monitor compliance, and in future will be contacting researchers who have not had articles published as Open Access papers.
Wellcome gets tough on Open Access depositions - Peter Murray-Rust's blog - March 7, 2009
Even if you just look at the language we use - "recruitment", "compliance" - it's clear that IRs have become about coercion, which should be making us seriously question their value. The good news is that there is a lot of good thinking about this - for example Les Carr suggests the idea of making the repository a file system for researchers, and many have suggested making repositories more web-friendly (or eliminating this special container we call IRs altogether, and just using regular web tools).
If providing an institutional repository is your primary or core value to the organisation, you are putting yourself at tremendous risk, because a savvy administrator may notice that you can purchase hosted repository services from BePress and BMC Open Repository. Any time a primary function (however valuable) has become commodity, you are at risk.
Data
2. Data is a strange thing. Unlike the publisher resistance to article repositories, there is pretty much universal agreement amongst all parties that data should be openly shared. There are many reasons it is mostly not being shared. Data can have very complicated licensing. By its very nature, it is complex to manage and interpret. And researchers who are, to be blunt, somewhat indifferent to sharing their papers, may actively resist sharing their data as they may feel it is the foundation of their future research. There's lots of good work being done - just today Peter Murray-Rust points to some practical developments in Open Data in Science - and John Wilbanks and his team have been doing deep and valuable work on data licensing as part of Science Commons (see e.g. Databases and Creative Commons), but we are a long ways away from massive, agreed-upon sharing and preservation of data. Also a risky area in which to bet your organisation, but a good area to be doing small, practical experiments in data sharing and preservation with willing researchers. Canada unfortunately lacks an equivalent of the UK's national Digital Curation Centre to help make this happen here. There is an effort to gather information as part of Research Data Canada, but I don't know how widely known it is.
This is an activity that will have great value, once all the hugely complicated issues begin to be resolved. Data is very different from journal articles - it lacks a standard format, and the resources it can consume - into the petabytes make it a daunting task for any organisation or set of organisations to take on. I really admire the practical work that Amazon is doing with Public Datasets (thanks I suspect in large part to the vision of Deepak Singh). The most practical things we can do right now is share what data we have, think about what open data will mean, and try to get more and more data openly shared.
Advanced Discovery and e-Science
3. This is an important area that I think offers enormous potential for libraries. In Canada it is also hugely challenging because we have no national equivalent of the US NSF Cyberinfrastructure or the UK National e-Science Centre. The best we can do is kind of grassroots e-science, which is kind of a contradiction in terms, since the common understanding of e-science is that it is about tackling large scale problems with large scale infrastructure.
Where I think things are possible is on the smaller scale, building and integrating advanced discovery and integration with researcher workflows piece-by-piece. (This shouldn't be read as "build all" - integrating includes e.g. helping researchers integrate Connotea, Zotero, etc. into their workflows.) Many researchers are not that web-aware beyond Google searching - there are all kinds of tools that they could use. The library has a role in providing information about those tools. In the near term, there are some very quick wins just providing better discovery and information management tools, most of which are already available for free on the web. In the medium term, there are intriguing possibilities to support researchers with Virtual Research Environments. And in the long term, true semantic discovery may be possible, with very advanced computational and visualisation tools supporting very sophisticated computer- and data-driven science.
Many pieces of this environment are being built. The library has a key role in integrating them and educating researchers about them. As indicated above, this is everything from
basic citation management - Connotea, Zotero and many others
to
Virtual Research Environments as being investigated by JISC and the British Library (PDF)
to
text mining on full-text, as planned by UKPMC
to
semantic discovery as is being pioneered by EMBL, Biogen Idec library, and many others in many fields (too many to list, but just in biomed see e.g. Semantic Mining in Biomedicine Symposiums and "Pharmas Nudge Semantic Web Technology Toward Practical Drug Discovery Applications")
As you can see this is an exciting space with many activities going on. The (research) libraries that can have a meaningful presence in this space (which currently has some daunting technical and infrastructure requirements at the high end) will, I believe, be able to sustain themselves by providing truly relevant and valued services to their researchers.
An important point must be made here: if you don't have some point of connection with your researchers - some discovery tools on your site and in their browser that the library provides, then you have no point of contact or credibility upon which to base all the advanced capabilities you may want to bring to bear.
UPDATE: I wanted to add some closing thoughts about the focus of this post. I'm a technology planner (that's a large part of the meaning of the rather grander "enterprise technology architect" job description I have). That means my main focus is on the technologies the organisation uses. Not the specific implementations (DSpace vs. Fedora) but the general classes of technology-enabled business functions in the organisation that are provided. So what I'm working through above is what kinds of approaches will be sustainable technology differentiators. That is, where can your library add technology-supported value that will be recognised by researchers. This has some implications for the people roles, the jobs the librarians would do, but I'm not examining that aspect. ENDUPDATE
Some of the topics about data and e-science that I have discussed above will be covered in the ICSTI 2009 conference in Ottawa this June (about which more in the following posting).
"If providing an institutional repository is your primary or core value to the organisation, you are putting yourself at tremendous risk, because a savvy administrator may notice that you can purchase hosted repository services from BePress and BMC Open Repository. Any time a primary function (however valuable) has become commodity, you are at risk."
We run bepress and still need a staff of three to upload material for authors, ensure copyright compliance, embed the IR in other processes, disseminate usage statistics, train admin staff, communicate with bepress and request changes and updates, attend research committee meetings, and run a very active awareness programme. Hosted services only take care of the tech stuff - there's a great deal more to running a repository than that. Did you speak to repository managers about your ideas?
S. Meece
Digital Collections Officer
University of Surrey, Guildford, U.K>
Posted by: S. Meece | March 17, 2009 at 08:30 AM
@S.Meece I recognise that an IR requires a tremendous investment in terms of people and process in order to be successful, and I wasn't in any way trying to understate the critical role of the repository managers.
As I said in the update to this post, in subsequent posts, and in discussion on FriendFeed, I'm looking at this from a technology differentiation point of view. So read it as "if (as an IT group) providing (the technology infrastructure for) an institutional repository is your primary or core value" then be aware that that *IT service* can be replaced with a hosted solution.
The tagline for this blog is "Thoughts on the use of technology and other issues" - technology is always the perspective I'm coming from. I realise this often creates a communication gap in this blog, and I appreciate feedback on how to clarify my statements.
A longer version of this answer is: assuming you have run your full planning process and designated the institutional repository as a priority, with full resourcing, don't assume either that: 1) simply hosting the IR will provide a measure of "protection" for your organisation from technology or staff cuts 2) providing the repository service will be necessarily be perceived as a high value-add. I'm coming from the perspective of an organisation that is getting a 50%+ cut, so I'm preoccupied with what services the library can provide in order to continue to exist.
Posted by: Richard Akerman | March 17, 2009 at 09:28 AM
Richard wrote: "The best we can do is kind of grassroots e-science, which is kind of a contradiction in terms, since the common understanding of e-science is that it is about tackling large scale problems with large scale infrastructure."
I wonder if it may be helpful to use the term "eResearch", standard in Australia and I think in increasing use in the UK and US, to cover the whole spectrum of applications of advanced information and communications technologies to the practice of research. eResearch encompasses not only big e-science, but a long tail of smaller research projects in all disciplines benefiting from data management, collaborative environments, and/or high-performance computing.
I certainly endorse the distinctions you draw between publications in repositories and sharing of data. Commencing with "small, practical experiments in data sharing and preservation with willing researchers" (but aligned with existing work not reinventing wheels!), and building upwards from there as we gain experience, is for me the right approach to development of infrastructure for research data preservation, management and sharing.
Posted by: Jim Richardson | March 30, 2009 at 01:00 AM