Posts categorized "Institutional Repository"

March 14, 2009

revisiting potential research-support roles for the library

Three years ago, I wrote this list of potential research-support roles for a library in the digital environment:

  1. institutional repository for pre-prints and post-prints of the research organization's publications
  2. data repository for the research conducted at the organization
  3. providing advanced (data/publication/information/discovery/etc.) tools that integrate into the researcher's workflow

These are numbered for convenience, not importance.
What do I think, three years on.

Institutional Repositories

1. While institutional repositories are valuable, they currently benefit primarily organisations, not researchers.  They provide a unified view of an organisation's published output.  For individual researchers, their priority may be just on getting published, or if they do want to disseminate their work, they may just post it to their own website (and sad to say, may get more Google rank having it there than in their repository).

Because of this property, there is still a huge content recruitment challenge for IRs.  I saw this at SPARC Digital Repositories 2008 where, to be blunt, the tone seemed to be mostly "we built it and they didn't come".  And in fairness to individual organisations, even Wellcome with its billions and its mandatory policy isn't getting good compliance:

The Wellcome Trust have been monitoring compliance rates, and have been disappointed to find that these are currently very low. As a result of this, they intend to more actively monitor compliance, and in future will be contacting researchers who have not had articles published as Open Access papers.

Wellcome gets tough on Open Access depositions - Peter Murray-Rust's blog - March 7, 2009

Even if you just look at the language we use - "recruitment", "compliance" - it's clear that IRs have become about coercion, which should be making us seriously question their value.  The good news is that there is a lot of good thinking about this - for example Les Carr suggests the idea of making the repository a file system for researchers, and many have suggested making repositories more web-friendly (or eliminating this special container we call IRs altogether, and just using regular web tools).

If providing an institutional repository is your primary or core value to the organisation, you are putting yourself at tremendous risk, because a savvy administrator may notice that you can purchase hosted repository services from BePress and BMC Open Repository.  Any time a primary function (however valuable) has become commodity, you are at risk.

Data

2. Data is a strange thing.  Unlike the publisher resistance to article repositories, there is pretty much universal agreement amongst all parties that data should be openly shared.  There are many reasons it is mostly not being shared.  Data can have very complicated licensing.  By its very nature, it is complex to manage and interpret.  And researchers who are, to be blunt, somewhat indifferent to sharing their papers, may actively resist sharing their data as they may feel it is the foundation of their future research.  There's lots of good work being done - just today Peter Murray-Rust points to some practical developments in Open Data in Science - and John Wilbanks and his team have been doing deep and valuable work on data licensing as part of Science Commons (see e.g. Databases and Creative Commons), but we are a long ways away from massive, agreed-upon sharing and preservation of data.  Also a risky area in which to bet your organisation, but a good area to be doing small, practical experiments in data sharing and preservation with willing researchers.  Canada unfortunately lacks an equivalent of the UK's national Digital Curation Centre to help make this happen here.  There is an effort to gather information as part of Research Data Canada, but I don't know how widely known it is.

This is an activity that will have great value, once all the hugely complicated issues begin to be resolved.  Data is very different from journal articles - it lacks a standard format, and the resources it can consume - into the petabytes make it a daunting task for any organisation or set of organisations to take on.  I really admire the practical work that Amazon is doing with Public Datasets (thanks I suspect in large part to the vision of Deepak Singh).  The most practical things we can do right now is share what data we have, think about what open data will mean, and try to get more and more data openly shared.

Advanced Discovery and e-Science

3. This is an important area that I think offers enormous potential for libraries.  In Canada it is also hugely challenging because we have no national equivalent of the US NSF Cyberinfrastructure or the UK National e-Science Centre.  The best we can do is kind of grassroots e-science, which is kind of a contradiction in terms, since the common understanding of e-science is that it is about tackling large scale problems with large scale infrastructure.

Where I think things are possible is on the smaller scale, building and integrating advanced discovery and integration with researcher workflows piece-by-piece.  (This shouldn't be read as "build all" - integrating includes e.g. helping researchers integrate Connotea, Zotero, etc. into their workflows.)  Many researchers are not that web-aware beyond Google searching - there are all kinds of tools that they could use.  The library has a role in providing information about those tools.  In the near term, there are some very quick wins just providing better discovery and information management tools, most of which are already available for free on the web.  In the medium term, there are intriguing possibilities to support researchers with Virtual Research Environments.  And in the long term, true semantic discovery may be possible, with very advanced computational and visualisation tools supporting very sophisticated computer- and data-driven science.

Many pieces of this environment are being built.  The library has a key role in integrating them and educating researchers about them.  As indicated above, this is everything from

basic citation management - Connotea, Zotero and many others
to
Virtual Research Environments as being investigated by JISC and the British Library (PDF)
to
text mining on full-text, as planned by UKPMC
to
semantic discovery as is being pioneered by EMBL, Biogen Idec library, and many others in many fields (too many to list, but just in biomed see e.g. Semantic Mining in Biomedicine Symposiums and "Pharmas Nudge Semantic Web Technology Toward Practical Drug Discovery Applications")

As you can see this is an exciting space with many activities going on.  The (research) libraries that can have a meaningful presence in this space (which currently has some daunting technical and infrastructure requirements at the high end) will, I believe, be able to sustain themselves by providing truly relevant and valued services to their researchers.

An important point must be made here: if you don't have some point of connection with your researchers - some discovery tools on your site and in their browser that the library provides, then you have no point of contact or credibility upon which to base all the advanced capabilities you may want to bring to bear.

UPDATE: I wanted to add some closing thoughts about the focus of this post.  I'm a technology planner (that's a large part of the meaning of the rather grander "enterprise technology architect" job description I have).  That means my main focus is on the technologies the organisation uses.  Not the specific implementations (DSpace vs. Fedora) but the general classes of technology-enabled business functions in the organisation that are provided.  So what I'm working through above is what kinds of approaches will be sustainable technology differentiators.  That is, where can your library add technology-supported value that will be recognised by researchers.  This has some implications for the people roles, the jobs the librarians would do, but I'm not examining that aspect.  ENDUPDATE

Some of the topics about data and e-science that I have discussed above will be covered in the ICSTI 2009 conference in Ottawa this June (about which more in the following posting).

November 18, 2008

SPARC 2008 on the web

SPARC Digital Repositories Meeting 2008 - November 17-18, 2008

A few channels for you to follow the meeting:

  • Twitter hashtag #sparc08
  • I made a FriendFeed room sparc2008 that includes the RSS feed from the Twitter hashtag (anyone is welcome to add additional content)
  • I have been trying an experiment with the CoverItLive "liveblogging" tool -

    SPARC 2008 liveblog experiment

  • Dorothea Salo blogged about John Wilbanks' inspiring call to action (just share, innovate and hack, rather than talking about it)

November 17, 2008

SPARC 2008 liveblog experiment

Just an experiment with CoverItLive - may fail totally, we'll see.

September 17, 2008

enabling IR compliance by pre-populating

I wanted to highlight some great work by Mark Leggott and his team at UPEI in building a system that demonstrates how you can connect a bunch of services and technologies together using a good platform, in order to get a powerful combined experience.

In brief (in my understanding), the Repository in a Box takes as input the metadata of your institution's academic output, then based on that metadata creates accounts for individual academics listing all of their papers, and on a per-article-record basis adds information from SHERPA/ROMEO about the policy of the publisher on OA, and an OpenURL link so you get to publisher and other appropriate copies.

So in a case where the policy allows deposit of the post-print PDF from the publisher, you're one click away from depositing that article.  If not, the author has all the metadata needed to find the right version to submit.

I think this is brilliant.

The only thing I've seen that is similar is BibApp (see Code4Lib 2007, and IDEALS@UIUC: BibApp Presentations).  I saw that at Open Repositories 2008.

July 23, 2008

official press release about NRC repository

<marketing-mode>
CISTI has released its official PR on the upcoming repository.

NRC Publications Archive: Extending the reach and increasing the impact of NRC research

It includes some clarifications about what will be stored and the extent of access that can be provided.</marketing-mode>

Steve has some notes about the underlying technical architecture.

July 15, 2008

Mandatory IR deposit as of 2009 for National Research Council Canada

From an internal email (with permission)

[The NRC Senior Executive Committee] SEC has established a policy making it mandatory, starting in January 2009, for NRC institutes to deposit copies of all peer-reviewed publications (articles, proceedings, books, book chapters) and technical reports in [the forthcoming NRC Institutional Repository, to be called] NPArC. The SEC has also approved an update to NRC Form 22 Licence to Publish (Crown Copyright) that will explicitly state NRC’s intention to deposit these publications in NPArC.

As this blog is by no means an official source of information about my organisation, if you have any questions I ask that you go through regular NRC or CISTI communications channels.

http://www.nrc-cnrc.gc.ca/newsroom/index_e.html

http://cisti-icist.nrc-cnrc.gc.ca/media/newsroom_e.html

UPDATE 2008-07-23: There is now an official press release from CISTI.

April 02, 2008

OR08 - the presentation layer is destroying our data

I have lots of raw notes, but I'll wait to see whether the presentations show up at the Open Repositories 2008 conference repository (for some reason, I keep wanting to spell this "respository").

http://pubs.or08.ecs.soton.ac.uk/

One of the main themes that I've heard in terms of doing science with repositories over the past couple days is that presentation formats, particularly PDF, are destroying the data (e.g. chemical structures and reactions) that we have so carefully assembled.  Then we have to make machines work really hard to try to reconstruct this data, which is madness to me (although I accept it may be the only practical solution in the near term).

I would argue that HTML plays a similar role in emphasizing "what looks good" rather than adding to that "and is also usable by machines under the hood".

And in a different way, PowerPoint, with its constraints of display and its style of bullet points, discards our complex ideas and presents them in a lossy, radically oversimplified way (with a dependency of course on the skills of the presenters).

April 01, 2008

Microsoft Summit on Repository Interop - notes

April 1, 2008 - I had read the posting by Savas (probably via Lorcan), so it was great to have an opportunity to hear about Microsoft's thinking directly from them.  The most dramatic announcement was that Microsoft Research will be developing entirely on the Linux platform.

UPDATE: Lee Dirks said I almost gave him a heartattack with my little April Fools' prank, and the day is wearing on, so it's time to update and move my text up from the bottom...

Thanks go to Lee Dirks and David Flanders for making my first full day in Southampton a very interesting one.  The Linux platform bit is was my contribution to April Fools.  MS Research Tech Computing are in fact of course entirely dedicated to Microsoft platforms.  ENDUPDATE

For further discussion of the MS Repository Platform efforts, they have created a group

http://community.research.microsoft.com/forums/90.aspx

I'm sure it has happened before, but it was the first time I had seen the leads/directors of Fedora (Sandy Payette), Dspace (Michele Kimpton) and Eprints (Les Carr) brought together.

There was a lot about SWORD and also some on OAI-ORE.

Notes on Microsoft Summit on Repository Interoperability event

Lee Dirks
External Research, Technical Computing
- Putting computing into science
- Putting science into computing

Science + computation are not the entire equation
* Microsoft must improve its offerings throughout the scholarly communication lifecycle

Approach: Conduct prototyping projects and proofs-of-concept to evolve Microsoft's scholarly
communication offerings

Five factors Microsoft considers key
* Interop is paramount
* Optimize for data-driven research & science
* Data preservation (and provenance) should be baseline
* Community protocols & conventions
* Social networking & semantic knowledge discovery

when possible IP shared at
http://www.codeplex.com/

Project Execution Models
* internal FTE
* external devel (vendor)
* external devel (institutional partner)
* mixed models

projects 1-2 years

Examples:
* GenePattern for Word 2008
- integrate data and images from GenePattern workflows into research papers
- will move into production in April/May 2008

* Math in Word 2007

* Chemistry Drawing for Office 15
- Peter Murray-Rust et al.
- Chemistry Markup Language (CML)
- proof-of-concept plugin ... but two versions of Office from now, Chemistry will be built-in (we hope)

* PLANETS
- EU project
- preservation of Office documents based on Office OpenXML (OOXML)

===

Savas
"Supporting researchers worldwide"

working towards an "eResearch Platform", a grouping of Microsoft tools that can support research

Flow: Author->Publish->Archive->Discover

Author
* Semantic Annotations for Word
(current target: protein databank)

* NLM DTD plug in - will support SWORD
- export a Word document in NLM DTD -> .nlmx

* Research Ribbon concept - tools relevant to researchers in Office

* can search arXiv from within Word using OpenSearch

Publish
* Conference Management Tool (also SWORD endpoint)
* eJournal - manage peer review (also SWORD endpoint)

Archive
* Research Output Repository (also SWORD endpoint and will support OAI-ORE)
* arXiv (also SWORD support)

? Repository interop/federation

Q: Shibboleth / OpenID support?
A: haven't started looking at it yet

===

Santosh
Microsoft's Research Output Repository Platform

Platform for storing scholarly works and metadata
- papers, videos, presentations, lectures, references...
- enables the development of new funcionality and services on top of the platform
- relationships between stored entitities

* SQL Server 2005 or 2008, Entity Framework, .NET 3.5

* the repository software (but not the servers) will be available to the community for free

Platform Overview
- variety of resource types (publications, tech reports etc.)
- resource tagging
- relationship between resources (triple-based)
- set of well-known predicates (IsVersionOf, Contains, etc.)
- new resource types and predicates through extensibility

Platform
* Core API
* Framework API
* OAI-PMH, Syndication, BibTeX, Search
- UI Web Controls

"A semantic computing platform"
- hybrid between relational database and a triple store

community.research.microsoft.com/forums/90.aspx

===

Stewart Lewis
Update on SWORD Protocol & Future Directions

http://www.ukoln.ac.uk/repositories/digirep/index/SWORD

- Simple Web Service Offering Repository Deposit

JISC/CETIS end of 2005
- identified lack of standard deposit API as #1 issue

2006: Creation of Repository Deposit working group

November 2006
- JISC call for funding, bid submitted for SWORD
- Julie Alinson
- lightweight and agile project

Workpackage 1: Evaluate existing standards
- WebDAV
- JSR
- OKI OSID
- ECL
- SRW Update
- SPI Google Data API
- ATOM Publishing Protocol (APP)

-> page on wiki examining them all

Workpackage 2: Tech Dev
- DSpace
- Fedora
- Eprints
- intraLibrary
* Java client library
- command line, desktop app, web interface

Workpackage 3: User testing and feedback
- arXiv
- SOURCE
- SPECTRa
- White Rose Research Online
- FeedForward

How does SWORD work?
* Two stages
- Discover
GET a Service Document
- Deposit
POST an item to the URI of the collection

GET
- X-On-Behalf-Of
- get a URI

POST

SWORD extensions to APP
* SWORD level
- 0
  - basic
- 1
  - full implementation

- X-On-Behalf-Of
- X-Verbose
- X-No-Op
- X-Format-Namespace

Discovery SWORD interfaces
* Recommend /sword-app
* Recommend /sword-app/servicedocument
* Recommend <link rel="sword" href="/sword-app/servicedocument" />

Authentication
- Required: HTTP BASIC

What?
- any package supported by the repository
- DSpace/Eprints: ZIP files with a METS manifest in SWAP format, with files
- Fedora: image files / METS documents (pull in referenced data streams)
- OAI-ORE resource maps

SWORD 2
- follow-on project
? more APP
? UPDATE / DELETE
? more clients
? client libraries
? provide support to users

Q: What is relationship with APP?
A: none

Comment: Sandy - We need a basic protocol that supports read and write.
Comment: Michele - We need to get into workflow - Zotero, EndNote etc.

Q: OAI-ORE and SWORD together?

===

Experience implementing SWORD at arXiv.org
Simeon Warner
Thorsten Schwander

1. Background
2. SWORD implementation choices
3. Ideas for SWORD evolution

automating from Microsoft Conference Toolkit

CS unusual in that conference publications very important
- use arXiv to host open access proceedings

work internally at arXiv to present conference proceedings as a whole

http://arxiv.org/help/api

Authority
1. author
2. the conference organizer
3. the CMT system (will use the organizer's authority)

returning errors
- all additional errors returned HTTP 400 Bad Request
- return an Atom document for each error code

3. Ideas for SWORD evolution

* Primary goal should be to reduce pairwise customization

- improved self description
  - self-describe size limits for uploads
  - improved error reporting
  sword:errorcode with namespace (and with description)

Integration with complex workflows
- asynchronous notification

===

DSpace
Michele Kimpton

Interop

* Business
- need defined business case / use case need because there is a small developer community

community will rally around common protocols

* operational
- policy transfer-control
  - embargo, authentication, dark archive...
- metadata loss
- identifier compatibility and acceptance

* technical
- numerous content packages
- representation incompatibilities
- interpretation of standards

Community Efforts

* OAI-PMH, OAI-ORE, SWORD, METS, IMS, SWAP
* federation acorss DSpace repositories
* working with key apps
* integration with "content creation" tools to ensure materials are deposited

===

issues: strong standardization of library *DATA*
        weak standardization of repository data

===

Les Carr
Eprints

drawing funny diagrams

user level interop

===

Sandy Payette
Fedora Commons and Interop

2007 Content Model Architecture (CMA)
- Registry of "content model" types for digital objects

Now: Simplicity

2008: Atom Syndication Format, OAI-ORE, simple common web APIs with wide appeal
and adopt other standads where possible

high-end interop (web services apis)
backend interop (Akubra) - various underlying storage - transactional stores, Sun HoneyComb,
Internet Archive PetaBox

* Topaz - application level objects and semantic interoperability

ligh-weight ways to let apps define object types

info objects mapped into triples and persisted in Mulgara triplestore

* Fedora Middleware Projects
- Simple JMS layer with e.g. Gsearch, OAI, Ingest on top

What do users really want interoperability to achieve?

Q (me): heavyweight APIs vs lightweight?
A: light for integration with web apps, heavy inside enterprise

===

Issues
- federation & interop
  - support for delete, update
  - document formats
- content creation opportunities
- content flow -> ingest

discussion of harvesting for search, Google Scholar

how are people providing federated search
- OAI-PMH
- one-off federated integration

Andy said something like "there's fundamental tension between simple and complex".
You can find Andy's liveblogging of the event through his Twitter stream

http://twitter.com/andypowe11

March 20, 2008

Open Repositories 2008

Through an unexpected series of events I find myself going to Open Repositories 2008

http://or08.ecs.soton.ac.uk/

The lineup looks great including a keynote from Peter Murray-Rust, and two (!) sessions on Scientific Repositories.

There is also a Repository Challenge for developers with a £2,500 prize, which is like a million US dollars now (finally, Canadians get to make US dollar jokes).  Kudos to David Flanders for leading this "let's just build stuff and see what works" approach.

I will be blogging under tag/category or08, and twittering under hashtag #or08

I made an Upcoming event, mainly because then if you add the machine tag

upcoming:event=455039

to your Flickr photos, it will automatically put in a nice "Taken at Open Repositories 2008" logo.

February 06, 2008

whither the generalist library in a world of domain specialists?

Peter Murray-Rust blogging about the Academic Publishing in Europe conference (APE 2008)

Panel Discussion: What Matters? The Future Role of Libraries in Science and Society? Swallowed by OA Repositories, turned into University Presses or kept as Book Museums?

Here I have a problem. I appreciate that libraries have many roles and I’m a keen supporter. Guardianship of scholarship, preservation, access, etc. But this doesn’t come across in science. I see librarians because I’m working on information-rich projects but if I didn’t I wouldn’t. How many PhD chemistry students will come to the library. (We have a lovely library in our building, funded by Unilever, and students like working there because it’s quiet. But we wouldn’t build the same facility today. And Henry tells me that Imperial has closed its departmental library. They have a nice quiet work area - with terminals - but it’s not a library.  Librarians cannot make a new role out of being super-purchasing and contract officers for information - scientists neither see nor care. So I challenged the panel with this and similar points.

Science and technology move so fast that none of us can keep up. Subject librarians trained on the classical model cannot provide what scientists need. The bioscientists look to PubMed, EBI, PDB, etc as the repositories of knowledge - not to their institutions. What they need are information scientists embedded in their laboratories. People who know how to hack perl, python, Java, XML, RDF, RSS, etc. Where the flow of meta-information is from the scientist to the information scientists as well as the other way round. It’s a tall order. But the average 18-year old does not look in a library for scientific information - they look to Google and Wikipedia (which is why I contribute when I can find time).

Thes views are reinforced by what the biscoientists and physicists are doing. They create domain repositories. They either have large national or international organisations which are beneficient and wish to oversee the free movement of scientific infomation. With bio- it’s Pubmed and Pubchem, NCBI, PDB, EBI, etc. and with physics it’s arXiv and SCOAP3. These are domain repositories and that’s what we critically need.

I can see that certain primary research will naturally go to IRs - mandated fulltext, theses, etc. But  many will see Pubmed and SCOAP3 as the primary places, not their institution.

I guess underlying this is an element of social networking that the Internet exposes: allegience to local institutions is an artefact of physical proximity.  When physical interaction is a real part of your community, this is not a problem - the local public library remains a real meeting place.  The university library acts as a neutral meeting ground and study area.  But we find in the online environment, people tend to coalesce around their interests, not their locations.  When you go online, do you go to your city or neighbourhood web network (if such a thing even exists?) or do you instead go to sites around your personal network and interests: your Facebook friends, a digital photography site, your Warcraft Guild page and Guild Bank, your aggregator with blogs that interest you.

I never really quite got this school spirit thing of "our" team versus "their" team.  You may find that scientists consider their peers in their discipline as the group to which they owe their loyalty, not their institution.  That means their content and their efforts are going to flow to the online representations of their scientific network, whether that's domain repositories, conference sites, or specialised scientific discussion groups.

This is a challenge for the physical library, which brought together disparate groups on the basis of being the gatekeeper of physical content, and then built services (e.g. reference) for the crowds of people who flowed in.

One possible role is for the library to participate in the domain networks, as we see with the roles of NLM and British Library in PubMed Central and UKPMC.  And it's certainly a legitimate role to be the collector of the institution's output in an IR, as long as you recognize that the IR is just going to be one node in a much larger network of content that may be aggregated on a domain basis (e.g. one can imagine a chemistry portal that draws on PubChem, anything "chemistry tagged" across any IRs it can search, and other chem resources).

May 09, 2007

2007 IATUL and ELPUB programmes - open access and beyond

The ELPUB 2007 and IATUL 2007 programmes are up.  Both will feature topics in the areas of open access and scholarly communication.  It's interesting to me how much the technology and interests of the scholarly publishing community and the academic library community seem to be converging.

At ELPUB 2007, CISTI's Judy Best will be presenting

Challenges in the Selection, Design and Implementation of an Online Submission and Peer Review System for STM Journals

December 03, 2006

two perspectives on interoperability and what it could enable

In case you're wondering, I finally managed to clear my Bloglines backlog, including the published literature, D-Lib and Ariadne.  I found two very different, but complementary views on how service standards and standard interfaces can enable an enhanced scholarly workflow or other advanced combinations of services.

In Serving Services in Web 2.0 (Ariadne issue 47, April 2006, ISSN  1361-3200), Theo van Veen of the Koninklijke Bibliotheek in the Netherlands explores and explains some fundamental concepts of Service-Oriented Architecture and standard service interfaces.

In this article I discuss the ingredients that enable users to benefit from a Service-Oriented Architecture (SOA) by combining services according to their preferences. ... This concept is an extrapolation of the use of OpenURL and goes beyond linking to an appropriate copy. Publishing and formalising these service descriptions lowers the barrier for users wishing to build their own knowledge base, makes it fun to integrate services and will contribute to the standardisation of existing non-standard services.

In An Interoperable Fabric for Scholarly Value Chains (D-Lib, October 2006, Volume 12 Number 10, ISSN 1082-9873), Herbert Van de Sompel, Carl Lagoze et al explore how you can build services using an interoperable network of digital object repositories

This article describes an interoperability fabric among a wide variety of heterogeneous repositories holding managed collections of scholarly digital objects. These digital objects are considered units of scholarly communication, and scholarly communication is seen as a global, cross-repository workflow. The proposed interoperability fabric includes a shared data model to represent digital objects, a common format to serialize those objects into network-transportable surrogates, three core repository interfaces that support surrogates (obtain, harvest, put) and some shared infrastructure. This article also describes an experiment implementing an overlay journal in which this interoperability fabric was tested across four different repository architectures (aDORe, arXiv, DSpace, Fedora).

October 13, 2006

Institutional Repositories - 0915 David Moorman SSHRC

CARL Institutional Repositories: The Next Generation, an Access 2006 preconference
Tuesday October 10, 2006
09:15 David Moorman, SSHRC

mention of OECD policy on publically-funded research

research communication

SSHRC has embraced OA in principle,
but a big challenge going from principle to action

a lot of politics - institutional, municipal, up to international

support for research journals
the role of the article as a unit in IR
knowledged mobilization - make publically-funded research available

UManitoba data project

indicators of research impact
how can we prove/demonstrate that the investment in research is producing value?

* Does SSHRC have a policy?

No.

There is more to this than just mandating OA.

Currently: Embrace in principle.  Work out the details.  Promotional approach.
           Figure out how to support OA journals.  Conducting experiments to figure out best approach.

SSHRC vs. CARL: SSHRC is not institution-focused

challenge due to Canadian structure of provincial jurisdiction over education

there is still a limited adoption of IRs in Canadian universities

* What can CARL do?

build on other efforts
- look at University of California system
- EU - 7th framework programme (DRIVER project) - look at Montreal meeting - CANARIE -
  10 universities in 8 countries - European Research Area
- Public Knowledge Project

challenge: cultural, social, institutional barriers to real advancement
           SSHRC community has both highly conservative and very innovative members

Jean-Pierre Cote, University of Montreal - using longitudinal data sets at Research Data Centres
                                           1000 researchers/day using the RDCs
                                           surprise
                                           but opportunity

                                           set up IR that is methodologically focussed
                                           all research from researchers using social statistics
                                           about 5000 articles/year

Creative Commons Canada - dealing with copyright issues

letters going out first to authors, then to publishers holding copyright

model agreement will go up on the home page on the RDCs

SSHRC Transformation - new Strategic Plan

SSHRC IR?
- no technical barrier

Treasury Board - transfer payments
                 contract or grant
                 grant can't have post-award conditions, e.g. can't require article deposit
                 auditor-general has said this should be fixed
                 will be a long time before this is worked out

Official Languages Act - websites must be in both languages
                         how to handle research?
                         15,000 objects would have to be translated every year
                         SSHRC has asked their lawyers (through Industry Canada) for an opinion
                         untranslated repository would be ok... until anyone complained

CISTI is in the same boat - as are all the federal government libraries

but SSHRC can support the CARL community
- haven't given money to libraries directly since early 90s

challenges: $300 million from Fed Gov goes to universities anyway (indirect costs)
            out of SSHRC budget - but demand for this money far exceeds supply - currently able
              to support 20 - 25% of research community (NSERC can support about 60%)

two routes:
- OA journals
- IRs

Moorman thinks IRs will start to take off

thinks Google and Yahoo will start to move this, particularly on digitization

===

Q: UMontreal project?  Expand to all DLI (Data Liberation Initiative) datasets?
A: DLI is a completely separate project, and a much larger task

Carl Jackman - UManitoba - data set repository - they are at the very beginning stages

UVic - new data librarian - science data - Earth and Ocean Sciences - data sets behind Masters and
PhD research - what to do?

Moorman - also NEPTUNE project

response rate on SSHRC mandate on data sets - terrible response - almost nothing archived

Alison Ball - Federal Government S&T community looking at Official Languages for research

Q: how to deal with official languages then?
A: slowly win the political battle

there is a big challenge with the cost recovery model - if you have to charge, people can't afford it,
and it doesn't align with OA principles - but this is another political battle

===

all postings from this workshop I will technotag with CIR2006

September 24, 2006

JISC Digital Repository Wiki

At ECDL 2006, Rachel Heery pointed us to the JISC Digital Repository Wiki.

If I understand correctly, it's basically trying to collect information about all different types of digital repositories and related projects.

September 18, 2006

ECDL 2006 - tutorial - Fedora

14:30

Tutorial 5
Fedora
Carl Lagoze and Sandy Payette

[raw presentation notes]

Also see my notes from last year's European Fedora User Meeting.

They are tagged under EuroFedora2005.

* Intro
* Digital Objects
* Repository Service
* 15:45 Service Framework in Focus
=== Break ===
* 16:00 Semantic Web and RDF
* RDF and Fedora
* Case Study: NSDL
* Future Directions

Problem Space

* Complex, compound, dynamic objects

Theme

* From Documents to Integrated Information Networks

"a network-based scholarly communications system"

Repositories situated at intersection of key social and technical trends

Technical Context:
* SOA
* Web 2.0
* Semantic Web

Sampling of Fedora Community

* ARROW and DART - Australia
*** eSciDoc - Max Planck
* DRC - OhioLink
* Danish Technical University (DTU)
* Wegener Institute, Polar/Marine, Germany

Are there IR clients for Fedora?

** Fez http://sourceforge.net/projects/fez "DSpace plus"
* VALET http://www.valet.vtls.com/
* Elated http://elated.sourceforge.net/
* FIRE

Digital Object Model

Disseminators - metadata about things you can use [my summary] - a disseminator can tell Fedora how
to connect to a Web Service, but a disseminator is not a Web Service itself

Fedora Repository Service

* Exposes through SOAP and REST:
  - Manage (Ingest, Export, Validate, Version)
  - Access (Get)
  - Registry Search
  - Resource Index

Fedora Security Architecture

[various stuff]
* Shibboleth-to-Fedora servlet filter

XACML Policy

Preservation Support

[various stuff]
* Preservation Support Services (forthcoming 2006-2007) - being defined by working group

There are some performance limitations in triple-stores
- Fedora uses the Kowari triple-store
- NSDL is storing ~200 million triples?

Fedora and the Semantic Web

Motivation

* exposing repository as a network of objects
  - relationships
  - query the graph; discovery of related stuff
* indexing based on generalizable data model
* extensible enrichment of object descriptions
* inferencing from structure of graph

What are the applications?

* Digital libraries with structured objects
* Publishing Systems
  - Journals with Articles
* eScience
  - Text with Datasets
* Semantic networks

[There was more but I switched back to the other workshop.]

== end ==

September 08, 2006

new opportunities for your library - Ticer presentations

Many interesting presentations available from Ticer's Digital Libraries a la Carte, New Choices for the Future 2006.

Check the programme or the abstracts.

They all look interesting, but here are a couple particularly aligned with my interests:

  • Grids & e-Science: UK Experience & Their Potential to Impact Libraries, PowerPoint presentation as PPT file (7 MB) or as PDF file (4 MB), Dr. David Berry
    Note: It's entirely about grid and e-science, you'll have to figure out how it impacts your library for yourself.  The e-science cycle diagram on slide 10 is nice.
  • Aiming for New Levels of Cross-Repository Functionality, PowerPoint presentation as PDF file (774kB), Herbert Van de Sompel

via DigiCMB

August 14, 2006

British Library partnership to run UKPMC

The Wellcome Trust, as part of a nine-strong group of UK research funders, announced today that the contract to run UKPMC has been awarded to a partnership between the British Library, The University of Manchester and the European Bioinformatics Institute (EBI).

...

In the initial stages of the UKPMC programme, the British Library will lead on setting up the service, developing the process for handling author submissions and marketing the resource to the research community.

The University of Manchester will host the service – on servers based at MIMAS (Manchester Information and Associated Services) – and will support the process of engaging with higher-education users.

EBI, which is part of the European Molecular Biology Laboratory (EMBL), will contribute its biomedical domain knowledge and state-of-the-art text-mining tools to integrate the research literature with the underlying bioinformatics databases.

British Library PR - British Library-led partnership chosen to run UK PubMed Central
IWR - British Library to host UK's PubMed Central - August 1, 2006

Previously:
March 21, 2005  UK PubMed Central (UKPMC) and Wellcome Trust

July 14, 2006

institutional repositories preconf at Access 2006

Registration is now open for the free, full-day CARL Institutional Repositories pre-conference at Access 2006.

The pre-conference will be October 10, 2006, 9:00 a.m.-4:00 p.m.

This full day pre-conference is sponsored by the Canadian Association of Research Libraries (CARL) and is open to everyone interested in learning more about institutional repositories or sharing their experiences with others.

July 08, 2006

a couple upcoming events - repositories and code

*** 1 Open Repositories 2007 - 2nd International Conference on Open Repositories (ICOR2007)
January 23-26, 2007
San Antonio, Texas

CFP deadline:

October 2, 2006     Extended abstract, less than 500 words, double spaced

via Disruptive Library Technology Jester

*** 2 code4lib 2007
February 28 - March 2, 2007
Athens, Georgia

February 12, 2006

is the research library obsolete?

Assumptions:

  • scientific communication takes place through articles, whether pre-prints or post-prints, journal published or conference presented
  • most articles of scientific value will be subjected to peer review of some form
  • publisher websites provide acceptable access to articles, linked together online
  • articles are also brief enough to be conveniently downloaded (and then typically printed)

Types of library:

  • public library - provides access for general public to books (and secondarily to other published materials as well as transient formats like CDs, video cassettes, DVDs)
  • academic library - provides access for university community to books and academic journals
  • research library - provides access for researchers to books and academic journals

I assert that they public library still has some role to play as a community centre, and also because books are not (yet) convenient in electronic format.

Academic libraries have a role to play because undergrads don't know anything.  Every year there are undergrads who need guidance, and the academic library is there to help them.  Also, it is a good place to escape roommates, or find new potential bedmates.

Research libraries on the other hand, don't play any of these roles.  There is no public to serve.  There is no community meeting place role.  There are no confused or desperate undergrads to help.  So shouldn't a research library just

  1. digitize and index all of its current (out of copyright) paper holdings, and then send the paper into storage in some climate-controlled cave somewhere
  2. provide good licensed access to the necessary publisher websites for its researchers
  3. close down

Does anyone disagree that the traditional role of a research library, that of providing local convenient access to scientific publications, is erased by the presence of publisher websites on the Internet?  That being the case, what value is left for research libraries to add?  Researchers don't need (or want) the guidance or handholding that undergrads require.  Is there anything left for the research library other than inventing new roles for itself?  I can only see three roles that make sense:

  1. institutional repository for pre-prints and post-prints of the research organization's publications
  2. data repository for the research conducted at the organization
  3. providing advanced (data/publication/information/discovery/etc.) tools that integrate into the researcher's workflow

The first two roles are very much aligned with library and archiving roles, but may still require a bit of a revolution in how the organization sees itself.  To put it more concisely, either your research library becomes part of the E-Science Cyberinfrastructure, or it gets paved over.

How is your research library dealing with this challenge?  Have I missed something?

UPDATE 2006-02-15:

See previous and subsequent postings for some more ideas and thoughts in this area

UPDATE #2, 2006-02-15:

I have written my thoughts about the reaction to this posting in paved paradise: the future of (a particular type of) research library?

December 03, 2005

Digital Archives for Science conf underway, with blogging

Digital Archives for Science & Engineering Resources (DASER) Blog

http://asistdaser.tripod.com/daserblog/

October 31, 2005

Electronic Theses and Dissertations symposium Australia 2005 and Laval 2006

The 8th International Symposium on Electronic Theses and Dissertations - ETD2005: evolution through discovery was September 28-30, 2005 in Australia.

Mostly papers in PDF format are available.  However for Stevan Harnad's opening keynote, both his paper Maximising research impact by mandating institutional self-archiving (PDF) and his presentation (PowerPoint) are online.

ETD2006 will be at Université Laval in Quebec, Canada.  In June 2006 I think.

Laval has lots of cool electronic developments.  The page for the Bibliothèque de l'Université Laval is certainly an interesting starting point.  They have been working on their electronic theses project for years, see

ETD Implementation at Université Laval (PowerPoint) from 2002
E-Theses Project at Université Laval (PowerPoint) from Access 2003
R&D @ Laval University Library -- Archimede and ETD's (PDF) from IATUL 2005

The repository itself is searchable at http://www.theses.ulaval.ca/

September 30, 2005

Fedora repository conference in Wales October 24, 2005

If you've been reading about the Euro Fedora [repository] meeting and wished you could be there, luck has it that the National Library of Wales is also hosting a meeting next month, and although registration has closed perhaps the nice chaps there may still let you in :)

It's called Digital Asset Management with Fedora [repository].

To be held on the 24th October 2005

Hosted by Llyfrgell Genedlaethol Cymru / The National Library of Wales, Aberystwyth, Wales, UK

September 28, 2005

Euro Fedora User Meeting 2005 - Fedora as an OAI-PMH (and WS) Compliant Data Provider

Fedora as an OAI-PMH (and WS) Compliant Data Provider (PowerPoint)
Ana Macario, Alfred Wegener Institute

* very data-centric organization
* involved in DataGrid

* SOA at AWI
[diagram of 2004]

In practice...
post-print -> repository
PI is supposed to release data when published, but by then, it is lost or there are excuses

So need Staging -> Publication

So: Fedora to do all this.

* Reasons for Fedora
- Virtual Repository
- not restricted to Dublin Core
- standards compliant
- etc.

[diagram of 2005]

currently using dc.source and dc.relation as a hack to express linkages,
convinced that proper way to do this is RDF/XML

Long-term issues
* Benchmarking for large number of files
* out-of-box web client acceptance
* fine-grained access control and Shibboleth based AuthN - relevant in DataGRID
* support for sets
* federation model
* collaboration and support
- disseminators for visualizations services; relevant for DataGrid
- Eclipse project to facilitate plugin devel
- Google strategy?
- seminars, tutorials for advanced Fedora users

Euro Fedora User Meeting 2005 - Researching Fedora to Serve as Central Repository System of the State and University Library

Researching Fedora to Serve as Central Repository System of the State and University Library
Stephan Drescher (PowerPoint)
and Birte Christensen-Dalsgaard (PowerPoint)

* National Library - digital preservation
* Storage Preservation
- as many different systems as different objects (currently)
- including both digitized and born-digital (IR and Webarchive)

working on automatic ingest
working on national infrastructure for institutional repositories, with infrastructure in charge of
preservation - national infrastructure for storage with redundancy etc.

Preferred preservation strategy: migration

working with Royal Library in Holland who have an emulation program running with IBM

Stephan: Archiving of Denmark's Broadcasting of Radio and Television (BART)

* covering 24/7/365
* 220 GB a day
* data needs to be evaluated and eventually corrected after 48 hours
* automatically ingested into repository

How to fit Fedora within Bart's resource workflow

Uses Linux and off-the-shelf.

80-100 TB a year

Q: are you going to put this data into the repository?
A: into the repository only references (to the raw data) will be input

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004