JISC Middleware Programme and Shibboleth
Terry Morrow, JISC Consultant
JISC Core Middleware
- directory services
[stuff about Athens - centralised authentication + authZ - technology + infrastructure (people)]
* Athens requires management of separate "Athens accounts"
- Recent development (AthensDA) uses local IDs
* Little take-up of Athens outside UK
* Service providers have to license Athens
* Centralised service - relatively high operational costs
* Not well suited to increasingly complex authN scenarios
* Meanwhile other countries moving to SAML/Shib
JISC Core Middleware
- tech devel
- better understanind of middleware
- build working Shib infra
- support takeup of Shib
* programme has funded 15 different projects so far
Core Middleware - Infrastructure
Aim: establish a working Shib infra
- Data Centre services (MIMAS and EDINA) to be made Shib compat
- create Athens-Shib gateways
- funding early Shib adopters
- creating service to assist adopt
- estab national UK fed (to be known as *Sparta*)
- liase with suppliers/vendors
Eduserv has written their own IdP
UK Data Archive - SAFARI
- Access control to a wide range of social science survey data
- the UK Federation - what to do
- UK WAYF will be established
- cultural change: shifting functions from libraries to computing science
Access Management projects at LSE (PERSEUS and ShibboLEAP)
John Paschoud, LSE (London School of Economics)
an institutional (university library) perspective on cost, benefits, etc.
- "a big database table with 3 million rows and 300 columns" (somewhat facetiously)
- LSE Shib-IdP had been previously established
- used to explore
= access to end-user info systems
= [other things I missed]
* used to bind 4 different internal systems, and then with others: uportal, Internet2, JISC, Endeavor,
* relationships with other projects:
DART (Digital Anthropology Resources for Teaching),
nereus (sharing economics resources), ShibboLEAP
LSE is in lots of federations
Terms of content licenses
* Initial scoping audit of LSE Library e-resources management and current e-licensing situation
* Active participation in Meridian (Endeavor e-resources management)
* Active particpation in major e-licensing initiatives
* Joint PERSEUS / NEREUS study of e-resources Access Management terms (across 6 Euro countries)
Looked at roles of users
[list of roles]
* slow progress with Signet/Grouper, uPortal, and WebCT
* need for a central UK (or wider) database of standard licenses for major suppliers that reflects
Shib-usable (eduPerson) attribs
Alumni groups as Virtual Organisations
* get the UK Shibboleth ball rolling, something about access to Eprints repository? ~150k people
* Role-based access in open archives: who is permitted to do what
* creates IdPs sitting on existing infrastructures at the organizations
* enables Eprints as Shib SP
* Regular Library AND IT staff at each site
* High-level buy-in
* Focussed Project Management Board
- defined tasks for each meeting (one thing for them to decide)
* AuthN: easy
* AuthZ: not so easy
* other institutions can use this as the basis for their Shib IdP projects
a bunch of URLs including
Shibboleth adoption by content providers (Shibboleth and ScienceDirect)
Ale de Vries, Elsevier Product Manager, Science Direct
[Elsevier marketing stuff...]
[background on authentication]
* Allows access to remote services using campus login credentials
* No extra admin overhead for customer
* Simpler for users
* Complex to implement
* Requires agreement on "rules of the game" between all vendors and institutions
* Shared / Federated
- fulfills customers needs
- win/win (less admin)
* we will continue to anonymous, campus-wide access whatever the tech
* we will continue to offer PZN in exchange for basic end-user registration
Shib benefits as Elsevier sees them
* replacement for IP authN
- removed admin burden of IP maint
- removes dependencies on network arch
* allows PZN based on local creds
* removes need to remember multiple user/pass
* avoids problems of proxy servers
* helps us provide the broadest possible access to our customers
Shib and SD: ramp-up
* April 2002 - workshop
* May 2004 - Shib release
* based on Shib 1.1
* held workshops to involve customers and Internet2
- anon non-personal a must
- provide option to PZN with opaque unique ID
- needed support for deep linking (need to authenticate no matter where on the site the user enters)
Shib and SD: ... testing...
* Dartmouth, Georgetown, NYU, UCSD, Penn State
* no major problems
* none of the pilot participants rolled out access to broad user community
Shib and SD: production
* Fed 2005: Moved to InCommon
* July 2005: multi-federation support due to release
- main issue is branding and IdP discovery in a multi-federation world
* University of Southern California: up and running in May
* Working with SUNY Buffalo and Georgetown
* in discussion with JISC (UK), SWITCHaai (CH - Swiss), SURFnet (NL)
- multifederation: no one runs a WAYF of WAYFs
:( end users don't understand federation concept
:) but federations are geographically oriented
[example of Shib login to SD - recorded live in production]
uses a cookie to remember your organizational affiliation
* Tech new, complex and rapidly changing
* Federations are in very early stages
* Uptake is key... we are at a critical point
* need to make implementation easier for smaller customers and vendors
* Elsevier is committed to making access easier for users and will continue to support Shibboleth
Q (from AA of Google Scholar): do you see a lot of IP spoofing?
Q: assuming that you are planning to offer Web Service access to Science Direct... do you have plans
to offer access control
A: Web Services... well... not yet.
Q: what do you log? user path through site?
A: yes we log, analysis is on aggregated basis, we have to log path to do reporting to customers,
and for product improvement
Q: to what detail do you know the user when they login using InCommon - e.g. student from university X?
1. can come in using provider ID - identifies institution
2. can come in using target ID - identifies individual - dependent on federation - if not wanted for
personalization, Science Direct discards this ID
Q: do you see a lot of use of personalization in your product?
A: quite some - between 5 to 25%
Second theme: Identity and rights management
AAI - the scene
Authentication, Authorization, Infrastructure
Defining and organizing a common framework for identity and rights management administration within and between organizations
for (higher) education, research and their e-resources.
Users at one institution can access external resources without needing to be a registered user of the external organization.
Rights and roles can be set in Attribute Release Policy (ARP).
The resource provider collects attributes from the user home
organization WITHOUT KNOWING THE IDENTITY OF THE USER.
lots of advantages, including economic
What is not part of the framework
* What material that may be accessed and by whom
* Payments or payment mechanisms
* Agreements on purchase of material ...
Overview of DEFF SOA activities
Mogens Sanfaer, Technical University of Denmark
they use Web Services interchangably with SOA
[diagram of 3-tier DEFF architecture 2000-]
* one common service infrastructure - many portals
* facilitating cooperation as well as competition
(services can compete)
* paving the way for a re-integration of library and otehr university infrastructure domains
e- publishing / libraries / research / learning / governance
[diagram about integration promise of web services]
[diagram that I don't understand about the b2b nature of the service landscape]
is it possible to distill common services out of current monolithic systems?
1. implement pilot web services
2. facilitate pilot exploitation
3. implement more web services
4. establish cross-domain scenarios
5. contribute to open source tools and international cooperation
6. bring vendors into the game
1. Pilot Web Services
- global eprints using Fedora
- digitised Danish journals based on Oracle
- XML to Z39.50 gateway ->
HTTP SOAP/REST Web Services gateway is open source, available for download
2. exploitation project Web Services Testbed (started this year)
- training courses, workshops, consultation
- explore technical means to ease exploitation such as WSRP, UDDI
- gather feedback - trying to find the killer app
- discuss business models issues
it may take a bit of time for library uptake of Web Services and SOA concepts and technologies
3. more Web Services
pull from DEFF e-learning
journals, books, everything as Web Services - for consumption by Blackboard, CampusNet, Open Source LMS...
*** projects trying to satisfy these needs
4. beyond the library domain
DEFF e-publishing is building a national research database and repository architecture based on XML Web Services
* The collaboration with the Danish Universities Digital Governance Project has created an opportunity
to integrate the admin and library domains e.g. systems to create reading lists
* would also like to be able to create a national portfolio system for students, to capture their work
(my note: another application that has been discussed for these systems is national research portfolio for scientists)
["complex" architecture diagram]
* Is there a simple architecture?
synchronize local student portfolios (using harvesting and aggregation) with national portfolio
requires standard person id
how do you scale to e.g. pan-European? intercontinental?
5. contribute to open source and international cooperation
* X2Z gateway released, more to follow
* Fedora as generic repository architecture
- contribute especially in the areas of search and preservation services
- Euro Fedora User Meeting
6. bring vendors into the game
* Library system vendors et al. are starting to show interest in Web Services
Perspectives for the next couple of years
* implementing more web services
- OAI-based cross-searching services for research databases and library catalogues
* facilitating more exploitation
- implement best-practice examples of portals based on Web Services and open-source software
* crossing library domain boundaries
- contribute to the development of e-Science infrastructures in Denmark
* open source
- continue Fedora cooperation
* bring in commercial vendors
* exploit VIEWS - Vendor Initiative for Enabling Web Services
The DARE architecture and SOA-developments in the Netherlands
Martin Feijen, SURF
* SOA in higher education
* DARE technical architecture
* DARE organization
* conclusions and questions
help libraries to make the changes needed to support scientists in their work
SOA in higher education
- SURFnet (people over 18) is planning to use SOA in the very near future
- awareness of performance issues: protocol overhead may slow down inter-application response
- SURFnet Video Portal and DARE harvester are the first SOA implementations within SURFnet
SURFnet is making the case to their management to do SOA, but the management says "what is the business case"
Kennisnet (Knowledge net) - people under 18
- implicit SOA policy: focus on re-use of applications
- SOAP/XML interface between local web apps and central portal
* architecture is SOA based but
- to avoid performance, constraints are necessary
- use only when applicable, not blindfolded
- Task force group "Information Architecture in Higher Education" April 2005.
No specific recommendations about using SOA.
- Report of the Scientific Council of SURF 2007-2010
Advice: use SOA as architectural framework for the further development of the technical infrastructure.
- building a network of institutional repositories in the Netherlands
- uses OAI-PMH
- DARE is successful but we have a need for optimization
[diagram of DARE as is]
- app interface between local repository and other tools (e.g. Metis system)
= organizational issue
- no consensus persistent identifiers
- no solution for complex documents
- no unique identifier for personal names
- metadata quality issues
- sets and/or filters
Planned work on Technical Optimization
* content spider and filter (want to include datasets and not only publications)
* Digital Author Identifier (each scientist in the Netherlands will be assigned a number) - will integrate with OCLC/Pikas
* pilot for e-theses using DIDL and extended OAI
* persistent identifier
* demonstration project to create 15 operational learning object repositories, building on DARE architecture
* OAI-PMH, LOM and IMS-CP as standards
[diagram of Lorenet]
* Libraries need to change
* We want scientists to use repositories, but we don't speak their language
(Approach has been inside out - from library to scientist. Need to change to outside in.)
[diagram of organization, showing missing links between faculty and library]
* library self-evaluation and summer school for libraries
* embedding DARE in library workflow
* get closer to faculty and students
* not perfect but "good enough": light weight, flexible and little steps (SOA might be very handy to support this)
* need to do architecture AND manage change
* facilitate and encourage
* Primary goal: repository as a tool for research and learning
* collect, store, describe, disseminate and secure all digital objects that are relevant for scientific communication
* So: not only end products like PDFs or articles but also datafiles, models, learning materials etc.
* SOA is known mainly in the ICT community
* SURF, SURFnet and Kennisnet will use SOA (prudently)
* DARE will migrate to DIDL and extended OAI infrastrcuture
* DARE will facilitate libraries in their change process and to go beyond publications
Q: why DIDL?
A: have been talking about compound docs for about 2 years, had meeting to look at solutions...
decision was DIDL
Q: if you have persistent identifiers, won't you need a resolution service - one exists - ??Swedish steff program???
continuing Service-Oriented theme
Developing e-infrastructure to support new research and learning paradigms
Dr Liz Lyon
focus on research
focus on eBank UK
1. e-Research: a changing landscape
how do we disseminate?
diverse types of data
think about how data collections evolve over time
Recommendations from "Large scale data sharing in the life sciences" UK report June 2005
- standards, metadata
- data management
RCUK open access to data
"should be made available as widely and rapidly as possible"
[diagram of the scholarly knowledge cycle]
2. Developing repositories
"service-oriented technical framework... for research and learning"
* reference models
* service definitions
[diagram of JISC Information Environment Architecture - predates e-framework]
[diagram positioning eBank in scholarly knowledge cycle]
eBank UK Project
- open access to datasets
- linking research data to publications and to learning
- JISC funded from Sept 2003: now in phase 2
Exemplar: e-science testbed 'Combechem'
- grid-enabled combinatorial chemisty / crystallography
- national crystallography service
PSIgate (physical sciences info gateway) at Manchester
[diagram of data flow in eBank UK]
[diagram of combechem]
[crystallography workflow diagram]
Crystal Structure Data Reports
data is harvested (OAI), and then aggregated by eBank service
"proof of concept demonstrator project"
*** Linking data to publications ***
done in eBank UK portal
eBank also embedded into PSIgate portal
Issue: Ontologies for discovery in an interdisciplinary world
Issue: Persistent identifiers for data citation
- working on use cases
- various schemes: DOI, handle, ARK, PURL
- there are some identifiers within domains
Publication and Citation of Scientific Primary Data Project
National Library for Science and Technology (TIB)
University of Hanover, Germany
DOI for datasets
can cite data using DOIs
Integration into crystallographic publishing practices
working with IUCr journals
Integration into chemistry research workflows
* R4L - Repository for the Laboratory
* SMART TEA electronic lab notebook + annotations, myTea project
* How does this fit with research assessment (RAE) process? (UK process)
Integration into the curriculum and e-Learning workflows
* MChem course
* assess role in undergrad chem courses
* introducing school children to e-research?
Knowledge extraction and post-processing
* mining (data, text, structures)
* presentation (visualisation, rendering)
* in federated repositories: digital libraries, datasets, learning materials
* role of Google??
Repositories and digital curation
data preservation is in conflict with data curation (active use)
upcoming DCC conference September 29-30, Bath, UK
Q: How are you making the links between the publications and the datasets
A: we're looking for identifiers to identify the datasets to enable linking
who does that? publishers? institutions (within repository)?
currently in the demonstrator, they have a schema,
but they are looking at automated ways
Comment: Germany - subject-oriented information nets - initiative from the researchers themselves
They choose and organize the info they way they want. Not that well-structured, but they answer a
bit more the needs of the researchers
I found Carl Lagoze's talk on the next-generation NSDL particularly interesting. Basically he talked about two stages in the evolution of the understanding of digital libraries. In the first phase, people were just striving to get content online. At that time they maybe couldn't even imagine that within a few years, the Internet would have vast amounts of information available. In some ways, people were proceeding from an imperfect model of the library:
The formal part of a library is books come in, get catalogued, and then people use the card catalogue to locate books on the shelves.
This was replicated in content repositories (digital libraries, phase 1).
But this missed the whole informal, social aspect of the library. You lose both the community and the context when you just have raw indexed content online.
If I can turn a phrase, in a library, people congregate around other people, not around the card catalogue.
So in phase 2, they are attempting to bring back the contextual and social aspects of the library. This means that what some people may consider frivolous applications: blogging, photo sharing, instant messaging, recommendations, bookmarks... all these aspects of person-to-person communication need to be integrated. As well as automated "wisdom of crowds" technology: "the person who read this paper also read these papers" etc.
The good news is, there is no need to reinvent the wheel - the whole point of the Service-Oriented Architectural (SOA) methodology is that your role becomes as much one of integration as of creation. Take the best of the online services, and integrate them into your site.
The other part of the SOA paradigm is, in a way, that small, simple, imperfect but quickly developed and released pieces of technology can be more successful than complex, perfected tech. "Good enough" can be great.
I'm actually a bit concerned that we may get too caught up in elaborate Web Services architectures when simpler protocols may in some cases work better. But we have to balance that against getting trapped in silo solutions. The key is to architect so that components are loosely coupled. Herbert Van de Sompel showed how doing a good internal architecture with strongly separated components means that those components can then be re-used in external applications and distributed environments. In effect, he treats his local network as if it is a distributed network.
JISC-DFG-SURF-DEFF Knowledge Exchange Initiative
Four Nation Collaboration Agreement to Support ICT Development in Education and Research
Knowledge Exchange Office, Copenhagen
Denmark (DEFF) - Denmark's Electronic Research Library
Germany (DFG) - German Research Foundation
Great Britain (JISC)
Mission, Vision and Goals
- support innovative devel and use of ICT in edu and research
- increase return on investment
- enhance services and project through greater knowledge dissemination
- increase profile of national research
- where appropriate, strive for common infrastructure based on common standards
- two delegates (board members) from each country selected by national funding agencies + chair from hosting country (Denmark)
- national representative from each country
will look at additional countries in Europe and outside Europe...
http://knowledge-exchange.info/ (not up yet)
Official Launch December 2005.
Lessons in cross-repository integration learned from the aDORe effort
Herbert Van de Sompel, LANL
[scholarly communication diagram from 2003 OCLC Environmental Scan showing repositories in the centre]
build value chains across repositories
a few words about aDORe
aDORe is not a product
- components of aDORe software, usable in other environments, will be released
- services build on terabytes of locally stored content (Elsevier journals etc.)
- broke tight integration between data and app
- standards-based, modular, distributed, protocol-based interactions between modules
two front ends:
- OAI federator
- OpenURL resolver (for ? compound/complex object services? not a regular resolver)
Uses hundreds of OAI repositories locally.
Basically because of the good, loosely-coupled design, you can derive insights about inter-repository compatibility.
Repositories and units of communication
- Data-oriented research: not only text but datasets, software, simulations, dynmaic knowledge presentations
- Facilitate collaboration
Think about compound objects.
- has a persistent identifier
- contains material, and metadata about those materials
- can contain other compound objects
- minted by different repositories
- from different namespaces
- not (necessarily) locators
"I don't think it's accceptable to ask everyone in the world to use e.g. the handle system"
need XML-based representation for compound objects
- many options MPEG-21 DIDL, METS, IMS/CP, RDF...
- OAI interface for compound objects.
- Use OAI-PMH datastamp ~= new version.
- include provenance
This is NISO OpenURL - framework to define service-oriented applications - applications for classes of objects that you can describe with identifiers
You can pass a SERVICE TYPE in the OpenURL.
Conceptual interface is persistent.
OpenURL examples were in HTTP, but conceptually you could do this in SOAP - just pass the parameters.
Repository Registry - who is part of this federation of repositories
Object Registry - what is part of the federation
Query: get list of existing copies, and the INTERFACES to get those things
There is similar work in the area of learning object repositories.
- Dynamic Service-Oriented Overlay upon the federated architecture
Service Overlay OpenURL application - list of services that can be applied to an object (the services are decided by the LAYER, not by the repositories)
"magic engine" - a knowledge database that knows about potential properties of objects and relates those to potential services
You could have multiple service overlays in a federation.
This results in the ability to provide context-sensitive dissembinations of digital objects.
[demo. Very cool.]
If we can meet the requirements he presented, many interesting capabilities are possible.
Q: Open Knowledge Initiative (OKI) also failed.
Why not something like WebDAV, so we can talk outside the digital library domain.
A: OpenURL standard has come out of our world, but it has extreme potential.
Q: This is great work.
A: Thank you.
this is a big project about improving educational tools for Science Technology Engineering Medicine (STEM)
evolution of digital libararies
... federation - metasearch
lots of questions remain
but we are moving beyond this
We thought the work was getting stuff online: but that's (mostly) done.
The digital library thinking was around a warehouse model...
but a library is not a warehouse.
"The real goal is to re-establish the library as a knowledge environment where people organize around
information, contribute new information, and learn from each other."
Although library information flow was books - to catalog cards - to drawers,
there was a second flow: people in the library discussing.
In digital libraries we automated the part where we captured the info,
but we lost the discussion.
Can we capture and enable that discussion, the social network, within the new repositories?
"creating an integration mechanism for specialized audiences"
Creating a Collaborative Knowledge Network
"a web that sits above the web"
The web was not intended as TV - we work together to create knowledge.
So: what other things are doing this well?
Items are more complicated than just being individual unique "stuff".
Items may be polymorphic. Items may be created by the action of a dynamic service
(e.g. different colours of the same model of fridge - is each one a separate item - no, it's
an object with a colour service you can apply)
Concept of Information Network Overlay
About the NSDL
Phase 1: Metadata-Centric Approach
- massive metadata quality issues
there are broader problems
- access alone does not equate to educational value
We want to capture CONTEXT.
Components of a new approach
for a resource
- who used it?
- how was it used?
- how was it described and rated?
- how did THEY classify it
- how does it relate to standards
- how has it been aggregated
- what has it been used with
they want to use the information network overlay to represent it
using Fedora as the basis for NSDL Data Repository (NDR)
- Web Services association for info reuse/refactoring (e.g. "summarize for grade 11 level" service)
- Versioning ("I want last week's version")
used to do metadata ingest (even for web pages)
now: Focused Crawling and Selection
- expert seeded crawls
- expert-guided crawls
Description (Phase 1): manual Dublin Core
Description (Phase 2): use machines
Augmentation (Phase 1): Ask NSDL (not integrated with the rest of the NSDL)
Augmentation (Phase 2): NDSL Expert Voices - blog system
research area: how to build up automated annotations based on the blogs
instructional architect: tool to build e.g. lessons using NSDL
Q (he asked himself): are people going to contribute to this?
A: I don't know, but we have to try.
Q: combine resources from library and ... NASA, NOAA, Geographical Survey...
do you need to use just Fedora?
A: you can access the Fedora services
Q: quality control - metadata? annotations?
A: one way is to vet every resource - but this defeats the purpose of the crawler
Also, if you have a ranking system, the good stuff will bubble to the top.
"the wikipedia approach... statistically the good stuff will peek through"
Service Oriented Architecture
- the framework of the grid?
Introduction to the seminar theme
Technical University of Denmark
26 September 2005
The concept of SOA
- a grid of services that ... may be used flexibly... SOA is an *architectural paradigm*.
- not a particular implementation or product
- independent of platform
* Loose coupling
* Open standards
- directories and communication standards will support a very distributed grid of services
Searching scholarly literature: A Google scholar perspective
Anurag Acharya, Google
Goal: Best possible scholarly search
* Single place to find scholarly material
- search everything
- Relevance-based ordering
* Easy to use
- common queries should just work
- researchers just want answers
Idea: Index all forms of artciles
* Preferred: fulltext (fulltext only was initial goal)
* Fulltext online for only small fraction
- influential/seminar papers still offline
* Index whatever form is available
What the author thought was important (in the abstract) may not be what turns out to be
important in the end.
Idea: Be inclusive
* Provide worldwide visibility to all research
- Should be able to find research done anywhere
* Our goal is to find all scholarly work
** Make decisions on a per-article basis *
- Good work can come from anywhere
Idea: Univeral discovery
* Free to all users everywhere
* Access will depend on variety of factors
Idea: Rank as researchers do
* Ideal: The Stuff I Need To Know
* Approximation: Relevant stuff that is likely to be good
** How to estimate "likely to be good"? *
- who wrote it, where it was published, how many people cite it, where citations are from
* Plus usual information retrieval techniques
Idea: Automate citation extraction
* Necessary to be able to scale
* Much variance in citation styles
* Citations error-prone
** Need to normalize citations *
Idea: Rank work, not instances
* Single work may have many forms
- preprint, report, conference paper, journal article
* Each may be cited independently
but it should be grouped together
* Known in library community as FRBR
Idea: Links to offline content
* Libraries hold huge repositories
* Link to library resources
Challenge: Article selection
How do you decide what is scholarly?
"If it looks like a paper, it is very likely a paper" - if it has author, title, citations
Use citations to locate stuff.
Identify sites with many cited papers (to discover uncited papers).
Challenge: Citation extraction
Citation parsing challenges.
Citations styles can even be different WITHIN a paper.
How much of a language model do you need to differentiate words that are likely authors, words that are titles...
Challenge: Citation normalization
* Many sloppy citations, propagation of errors
Resource usage (Regazzi at NFAIS 2004)
Top 3 Online Scientific Search Resources
Librarian top search: Science Direct
Scientist top search: Google
Resource usage (LibQUAL CNI Spring 2005 Task Force meeting)
* most users use Google and Yahoo rather than going through the library web page
Cooperation with libraries
- Work together to help find the wealth of libraries
- Utilize the trend to search engines instead of fighting it
Support for libraries
* Library links
* Library search
- Open WorldCat
* Access to Google Scholar
- embed Google Scholar searches in library interfaces
Library links - details
* Link resolver provides config option
- if selected, journal holdings info is exported to Google
* Google crawlers periodically fetch these holdings files
* No authentication at Google
- Authentication by provider/publisher
- Link resolver can proxy/suggest authentication
* Links for online resources are highlighted
- Users are far more likely to utilize online resources (factor of 5 higher CTR)
* Linking is open to all libraries and free
- currently 325 to 350 libraries are participating
Google does IP recognition both for Google Scholar and Open WorldCat.
Additional features are enabled based on IP (e.g. ILL).
Exposes the library resources in the normal course of research through the search engine.
* Open to working with other union catalogs
- contact scholar-library at google com
Embedding Google Scholar
- specific searchs
Question: Can I combine this with / use this for metasearch?
- A: No. Ranking is tricky, interface is still evolving
Google Scholar Coverage
* Fulltext from all major publishers except Elsevier and ACS
* Includes popular papers from all publishers as citations/A&Is
* content from: Highwire, AllenPress, MetaPress, Atypon, Ingenta, MUSE, others
* Public A&Is - PubMed, ADS (Astrophysics Data Service)
* Open web and repositories: Arxiv, Repec, pubmedcentral
* open access journals - all Google can find
[Q: OAI or only web?]
Countries with most queries: US, UK, Australia, Germany, Mexico, Brazil, Canada, China, ...
* Audience will exapnd beyond scholars
- health/medical research, educated laypeople, patients, care-givers
Q: Citations and web links?
A: Only citations
Q: How to make repository easy?
A: must be able to follow links to each paper
- if only search, can't find
- if chopped up, can't chunk in scholar
Q (me): Harvest using OAI?
A: yes, but no easy way to determine OAI harvestability - need to email Google Scholar.
Argues it is best to expose for web crawling, so wider discoverability.
Q: Options for embedding?
A: search box, or prepopulated search
Futher integration (e.g. metasearch) impractical due to problems with ranking / ranking not possible.
Q: Humanities are not very well served from Google Scholar
A: I am trying.
Challenges: journals are not online. Many small groups (small publishers) to talk to makes the process slower.
Q: Topic-based / subject-based searches? e.g. "genetics" should provide the most important results in that field
A: Two issues with broad queries [? missed the answer]
A lot of important material is presented in a summarized form for e.g. undergrads, very difficult to provide this info.
Q: Repositories of scientific data, crystallography : raw material for science - important material, not very often cited
How do we get scientific data on the web?
A: Use Google. It will find all manifestations of information on a particular topic.
Google Scholar is specifically for articles.
Q: Loss of context - Google is artifically reconstructing context
[hard to understand this question]
In a grid environment where the context is preserved, what is the value of Google Scholar.
A: If you already had the context, there would be lots more things you could do.
Q: Google has plans to do more than than search - text mining across the papers?
Taking it a stage further - extract conceptual links.
A: Not in the forseeable future.
Q: Google supporting new standards? [? something grid standard ?]
[This is maybe a "will there be a Google Scholar Web Service question?]
A: If you can build web pages, you can connect to Google Scholar.
Q: (can you turn consolidation of versions on and off)
What is a strategy for recognizing something is a version of something else?
UK project: Versions
A: [basically no time to explain] - suggests a particular paper to read
Q: Categories ? Using library classifications manually?
A: automatic: scholarly papers are self-partitioning in broad categories
Q: Are you working with librarians (at Google).
A: No. There are two of us, but both of us are programmers.
Q: Issue with primary data sets
We are a very data centric organization, bridge publications and data sets.
Marine bioscience - citing primary data sets.
A: No such plans (data) in the near future.
Q: 325 libraries - are they going to report?
We had a lot of problems (UK Open University) - the metadata wasn't there to link.
A: Let's talk. Metadata may not be complete.
If your link resolver requires full metadata, there will be problems
Q: Linking to data sets
eBank - threeway conversation
A: let's talk
Q: UColorado - what is your business model
A: I have none. No plans to charge users, publishers, libraries.
This is currently a small operation, so there is no priority (right now) on monetization.
Building the Info Grid (BIG 2005)
Kim Østrup (DEFF)
Next generation of library services - how can they be developed?
Deployment of Technology
* Internet 2 services
* Authentication and Authorization management
* Grids and network computing
- storage grids
- computing grids
* standardisation and consolidation
* service architecture and web services
* which open systems and standards (XML, ...)
* How to migrate from legacy systems?
What policies can promote a strong national and international structure?
Research and E-Publishing policies
* policies needed on all levels
* negotiations with the publisher
Open innovation vs Intellectual Property Rights (IPR)
- We must support both
open archive / open source / creative commons / wikipedia
commercial databases / proprietary software / music labels / publishing houses
Are we creating value? and for whom?
DEFF 2006 - New strategy
- infrastructure (grid?)
- a national infrastructure for single authentication and authorization
- e-publication, open archive and IPR
- develop next generation of services
() web services
() shared services
- knowledge exchange
- education and information literacy
- research and innovation
In one week I will be at Building the Info Grid 2005 - Digital Library Technologies and Services.
It lines up well with my interests as it will be covering both Service-Oriented Architectures as well as Shibboleth-based distributed authentication.
I didn't really know much about Copenhagen (København), Google Earth was a big help to me in figuring out the lay of the land.
I made a placemark for the Copenhagen Business School:
I also found this map (PDF) of some hotels associated with the conference to be useful.
I have gathered a few bookmarks together under http://del.icio.us/rakerman/copenhagen
I made a placemark for building 101A at the Technical University of Denmark (DTU):
I also found this map (PDF) useful for figuring out the bus stops on the DTU campus.
I will have instant messaging on if you want to contact me, there is more info linked from my conferences page.
(Note: Due to TypePad MIME type issues, you may have difficulties downloading the placemarks. In Firefox, right-click and use Save Link.)