Posts categorized "Federated Searching"

March 27, 2008

Building SkyNet for Science - presentation for NISO Discovery Tools Forum

My presentation is available at

http://www.slideshare.net/scilib/building-skynet-for-science-discovering-new-frontiers-using-embedded-knowledge/

A lot of it is conceptual, so you may want to wait until the audio is available from the NISO site (hopefully next week).

UPDATE 2008-03-28: I forgot to mention that all of the supporting links for the presentation (will be) available at http://www.connotea.org/user/scilib/tag/nisodiscovery2008  ENDUPDATE

I thought it went well, although as first speaker up there is a disadvantage of not seeing how other people set up.  I was in a bit of a rush to get started so that I would finish on time, so I didn't do a great job of attaching my mike and I just held the wireless transmitter in my hand.  With the transmitter in one hand and my laser pointer gripped in the other for the entire 50 minutes, it's possible I looked a bit of a prat.

I usually try to remember to keep my hands free for presentations so that I can use more natural body language, anyway lesson learned.

I also forgot to say "The future is not set.  There's no fate but what we make for ourselves." before the last slide.

There are some common themes emerging from the presentations, I'm always amazed when a bunch of people develop presentations in isolation and then they actually all fit together when presented.

I've posted some photos of the presenters in my Flickr under nisodiscovery2008 (my cameraphone can upload directly to Flickr over WiFi), they also show up because of the machine tag linkage on the Upcoming page.  No pics of Chapel Hill yet as I don't have a car and it turns out while we're only about 4km from the town, the most direct route for me to get there I think would be to walk beside a six-lane divided highway, which is not too appealing.

UPDATE 2008-03-28: The carbon offset for my flights from myclimate.org (including the trip to Open Repositories) was about C$118.

February 08, 2008

CNI Fall 2007 presentations, podcast

Lots of interesting material from fall CNI.

An audio interview with Birte Christensen-Dalsgaard, Director of Development at the State and University Library in Aarhus, Denmark about the Summa search system and other academic library topics.

Current Experiments & Future Directions in Scholarly Communication - Timo Hannay, Nature

The eCrystals Federation: Open Data Repositories Supporting Open Science
Liz Lyon, University of Bath
Simon Coles, University of Southampton
Manjula Patel, University of Bath

Summa podcast link via DigitalKoans

Previously:
October 26, 2006  the future of the scientific paper and more on open web science (Timo Hannay)
March 31, 2006  presentations on e-Science and e-Biz workflow, and research data preservation (Dr Liz Lyon)
October 11, 2005  ILI2005 - Tuesday 11th - Living with Google: New roles for libraries (Birte Christensen-Dalsgaard)
September 27, 2005  Info Grid 2005 - Tuesday 27th, 09:00 - Developing e-infrastructure to support new research and learning paradigms (Dr Liz Lyon)

June 23, 2007

WorldWideScience.org

The ICSTI 2007 Nancy public conference finished with a presentation of the

WorldWideScience.org

international federated search science portal.

[DSC00370.JPG]

December 15, 2006

IT directions for National Library of Australia

Libraries, IT and Everything
Mark Corbould
Assistant Director-General
Information Technology
National Library of Australia (NLA)

[IMG_0347-2030347]

presented at Library and Archives Canada (LAC)
December 15, 2006

presentation notes by me, Richard Akerman

- strategic vision of the national library of australia

- function: basically, maintain a national collection and make it available

- 479 staff
- over 550,000 [physical, presumably] visitors
- 110 million page views on website, up 59%

~ 9 million items (including 2 million manuscripts and 2 million serials)

Access

- 365,000 physical items delivered
- 45,000 reference enquiries (4,000 via Ask Now)
- 13 million digital objects delivered

~80 TB of digital object storage, including Australian Web Domain Harvest
- main stuff digitized is unique cultural heritage
- 40% of digital collection is oral history

Web Site Usage

- nice chunk on PictureAustralia, new chunk on MusicAustralia

- IT Strategic Plan (3 years, annual update)

Strategic Directions [phase] 1

* for a long time it was build, describe, preserve and provide access to the physical collection

Strategic Directions [phase] 2 [in addition to phase 1]

* since 2000 "to provide rapid and easy access..."
- outcome: online federated discovery services with "no dead ends"

Looking Outwards

* Internet-enabled collaboration
- lots of freely available content
- organizations are making their content available for remixing and repurposing
- low barries to participation
- light-weight trust models, no MOUs
- sharing personal views of contemporary events
- creating social networks around areas of interest
- near enough is good enough
- loss of control
- need to be less risk averse

Strategic Directions [phase] 3 [in addition to previous]

* In 2006, "to enhance learning and knowledge creation by further simplifying and ... services..."
[slide switched before I got the rest]

Key IT Outcomes

1 ensure collecting record of Australia
2 To meet user need for rapid
5 Ensure relevance

[yes, he skipped a couple]

IT Goals

many including
* Develop and deploy new full-text search software
* Provide online spaces to support publishing, collaboration, contribution and interaction

The rest of the world wants to find stuff in Google etc.
"we've got to get our data out there"
worked with Google to get their books in Google Scholar

Get this item

* Bookshops
* Suppliers

Also A9.com - Libraries Australia is searchable by using OpenSearch lightweight protocol

working to do federated search across museums and other organizations, using OpenSearch

Libraries Australia [national union catalogue I think]

* considered replacing individual library catalogues (as a starting point) with
"Libraries Australia, as the  primary database to be searched by users"

to do this you need to be able to integrate well with all the OPACs, which of course is a problem

Expanding Borrowing

* Wake-up calls: statistics and commentary
- [Lorcan] Dempsey "Materials are not being united with users who want them" [not sure of quote,
switched to next slide]
[You can see the entire quote and more info in Kent Fitch's presentation A New paradigm for “getting” (PowerPoint) from Libraries Australia Forum 2006 (LAF06)]
- Long Tail argument

Current Fulfillment
- charge $13 for ILL, total cost actually $49

steady decline in ILL

ILL: Strong disincentives to participate
- Expensive
- Slow
- Loss of control of assets
- Inconvenient / impossible

Fulfillment

Making "Search, find, get" seamless

* Lend directly from library to reader - NetFlix model - "NetBooks"

Also get your presence into e.g. Amazon.com "Borrow this book from Libraries Australia" (using greasemonkey)

PictureAustralia
http://www.pictureaustralia.org/

includes pics from Flickr
about 6000 images harvested


  "All you need in Aussie" - by Alicia Zappier 
  Originally uploaded by desertgirl.

MusicAustralia

* bought metadata catalogue of australian contemporary music - will interface to ecommerce gateway

Model

* "metadata discovery should be free, but you may have to pay for fulfillment"
* or I would say "easy access is more important than zero cost"

AustraliaDancing

* "Take Part" - customized wiki

There's more...

* People Australia, based on using authority file
* Open Access Journals - "really cheap and easy, blurs between publishing and collecting"

Issues

* Sustaining existing services
* Managing expectations
* Supporting innovation
* Enabling rapid prototyping
* Being vibrant and relevant

Ok, but how

* Reorg IT

* Review IT Architecture
- Service Oriented [emphasis mine]
- Consolidate metadata repositories
- "single business" model

* Create an IT-aware organization
- Communicate, collaborate and train

42 IT staff, over 400 library staff

Additional tech notes
- using Lucene full-text search
- using Confluence - new IT architecture is on a wiki
- training business analysts in BPM

Q: How does IT decide on priorities for projects?
A: Through the operational plan, reviewed by Corporate IT Group that sets the priorities

If necessary, escalate to corporate management.

Q: Web Harvest?  How made accessible?  Federated discovery - what challenges?
A: Paid the Internet Archive.  Ran for 6 weeks.  About 180 million files.  On a PetaBox, installed in
Australia.  Internal access but not public.  Also the Internet Archive will put it up.

Permissions/legal issues within Australia.

Issues with robots.txt - if you follow robots, you may not get inline images or CSS.

Federate search is basically around OpenSearch, plus may need to add relevance ranking.

Q (me): How are you doing relevance ranking?
A: Currently Teratext

but going to use Lucene.

September 25, 2006

Wolfram's awesome library hacking and mashups

I had the good fortune to meet Wolfram Schneider at ECDL 2006.

He has done some great work with Z39.50 searching and also library mashups.

He built his app quite a while ago - probably it was one of the first advanced Z39.50 apps on the web, I don't know.

His web app is called ZACK Gateway, it does a Z39.50 federated search (what librarians, always needing to be different, call "broadcast search") across a wide number of libraries.

ZACK - Z39.50 Gateway for international libraries

ZACK Gateway (in German, for Germany)

ZACK Bookmaps

It is particularly specialized for locating books in Germany, but it will also check a small number of libraries worldwide (he has better data for Germany).

Here's a map for The World is Flat (it will take a moment to load).

He has also done a mashup that displays the distribution of MARC and other formats (Z39.50 target types), by catalogue type, worldwide.

ZACK Z39.50 target maps (select a target type to get a map)

I wonder if maybe he could work with LibraryThing - I'm going to suggest this.

July 06, 2006

Technorati rank, updating, and multiple blog URLs

In light of Declan's article on science blog ranking, I thought I'd check out my Technorank.

It turns out that depending how you identify my blog URL, you get different results on Technorati.

In particular, I rank either 30,009th or 31,912th.

My default URL is scilib.typepad.com
this can also be reached through the longer URL scilib.typepad.com/science_library_pad/

Technorati thinks these are two different blogs.

*** 1 http://www.technorati.com/blogs/scilib.typepad.com

Science Library Pad

*** 2 http://www.technorati.com/blogs/scilib.typepad.com/science_library_pad

Science Library Pad

I'm going to claim the second URL, I don't know if it will make any difference.

UPDATE: Here's an interesting twist.  I log in to Technorati and try to claim http://scilib.typepad.com/science_library_pad/ and get

Error: Unclaimable Blog      

Sorry, the URL you entered is not claimable.

... The only thing I can think of that I am doing that may screw them up, is that after I claimed my blog initially, I turned off the Technorati claim code in my blog - I have turned this back on now.  We'll see whether that changes anything.

March 09, 2006

a couple Web Services presentations, including SRU searching

Speaking of conferences outside the library domain, I wouldn't have guessed there would be technology relevant to me in a 2005 Conference for Law School Computing, but there are actually a number of interesting presentations.  Not all the slides are available, but webcasts are.

A REST-ful Web Services Approach to Library Federated Searching using SRU
presentation: http://www.cali.org/conference/2005/presentations/reisssac501230.ppt
webcast: mms://broadcast.cali.org/conf05/conf05sac501230.wmv

Developing Web Services to Spread the Word
webcast: mms://broadcast.cali.org/conf05/conf05th590230.wmv

This session will look at why web services, specifically SOAP and XML-RPC, are useful and worth developing as a means of providing wide access to various types of information.

I will use the development of the CALI Lessons web services API as an example.

(Unfortunately I found trying to follow the above presentation just from the webcast quite difficult.)

February 27, 2006

info from Federated Search Symposium

There was a Federated Search Symposium at the University of Calgary sponsored by The Alberta Library.

Info via Library Boy.

Distant Librarian has a good Federated Search Symposium wrap-up.

Roy Tennant of the California Digital Library did a presentation Metasearching: the Good,  the Bad, & the Ugly.

Also incredibly useful, the CDL posts their evaluations and assessments online http://www.cdlib.org/inside/assess/evaluation_activities/

The symposium website also has the other presentations, and since I know y'all want to see what the Google Scholar people had to say, here's the PowerPoint from Cathy Gordon, Director of Business Development - Google Scholar: Providing Visibility to Scholarly Literature.

The general tone I get is: metasearch = megapain.

I have a maybe naive question: why do we need to search, in the exploratory sense?  That is, within the specific context of library journal article holdings.  Don't we already know all of the journal issues we hold?  Isn't that the function of the whole licensing department?  Don't we have (or at least, have the ability to buy) the article metadata for all of our holdings?  What is there to be discovered?

Can't you just buy a service as follows:
1) I tell it what journals I have, ideally by having it talk to the publishers directly (e.g. machine to machine holdings API)
2) Since the service holds the article-level metadata (author, title, abstract) for every article, it just carves out that data into a search subset
3) I do a search on that subset.  No federation, since all of the data is in one place.
4) It points me to the fulltext - e.g. using my OpenURL resolver, which knows what stuff is held in what "databases" (websites)

September 12, 2005

NISO Metasearch Initiative presentation

NISO Metasearch Initiative Overview (PowerPoint)

from Triangle Research Libraries Network seminar.

One-search access to multiple resources will allow enabled libraries to offer portal environments so all library users can enjoy the same easy searching found in web-based services like Google. Unlike Google, however, a library's metasearch services must offer access to multiple database targets with a variety of content and accessibility. To move toward industry solutions, NISO is sponsoring a Metaserach Initiative to enable:

  • metasearch service providers to offer more effective and responsive services;
  • content providers to deliver enhanced content and protect their intellectual property;
  • libraries to deliver services that distinguish their services from Google and other free web services.

The presentation is quite detailed (66 slides) and gives a good picture of the varied activies of the different working groups, including some useful analysis of various technology options.  As indicated on their wiki, there are three task groups:

Task Group 1: Access Management

Task Group 2: Collection & Service Description

Task Group 3: Search and Retrieval

July 12, 2005

search Dogpile and compare Google, Yahoo and Ask side-by-side

If you haven't used Dogpile in a while, you might want to check it out again.
Click on the Google, Yahoo, and Ask Jeeves buttons to see the respective top results, listed in columns so you can compare them side-by-side.

Dogpile has a site about its comparison capabilities,

http://missingpieces.dogpile.com/

via Filipino Librarian

A couple other comparison tools are

http://www.langreiter.com/exec/yahoo-vs-google.html

and

http://twingine.com/

Twingine link via Lorcan Dempsey.

May 20, 2005

Atlantic Scholarly Information Network blog and info

I don't know much about the Canadian ASIN (Atlantic Scholarly Information Network) project, but they do have a blog with some info.  (Prepare for acronym fever.)

The CAUL [Council of Atlantic University Libraries] have begun implementing the ASIN Portal which includes single-signon authenticated access to the combined licensed and public resources of the CAUL libraries. The [Atlantic Provinces Library Association (APLA) conference] session will cover the individual components of the Portal including federated/broadcast searching, OpenURL resolution, Relais document delivery and Subject Rooms which are all maintained by the Sirsi’s Rooms context management system.

Via Stephen on CISTI Architecture internal bliki.

April 09, 2005

how to Google your catalogue

National Library of Australia News - RedLightGreen: How to 'Google' Your Library Catalogue

RedLightGreen is the simple yet powerful interface for the RLG Union Catalog, customised for college undergraduates. It looks like Google, Yahoo or other familiar Web search engines. But it leads students to print materials in their own libraries, and helps them locate significant, authoritative works. It returns results based on relevance and on how many libraries have that book, providing a gauge of the work’s importance. RedLightGreen also lets students check the availability of those works through links to their local library catalogue.

Normally, one would use a specialised system to search a library catalogue—and each catalogue has a different interface, customised for the underlying data structure. For librarians and scholars, these more elaborate interfaces can be very powerful tools; for a general audience, they can be quite frustrating.

via ResourceShelf

April 07, 2005

wise posting on federated search

Federated searching and why users aren’t finding/using your electronic materials

I think my alma mater would be a lot smarter to just add Google Scholar to their research databases and use SFX to allow students to click through to the full-text of each article. Lots of schools are doing it. There’s even a Firefox Extension for adding the OpenURL buttons to Google Scholar results! Very cool stuff! For now, that’s a far better option for finding materials from diverse databases than MetaLib. Plus, it’s free!

Sorry, I know I’ve been rambling and ranting, but I really think libraries could be providing better access to their online and print materials. 

March 21, 2005

make your sites for users

Incidentally, I was also interested to see that OSU Libraries place a search box to their metasearch service square in the middle of the library home page. This is alongside a list of library services. The library user does not have to spend time looking for things. It is nice just to see the list, each with an explanatory phrase. This seems much more helpful than a list of services clustered under headings which may or may not make a lot of sense to the user who has not been initiated into the inner workings of the library.

from Lorcan Dempsey - Oregon State University

February 12, 2005

ParaCite and the location problem

You've got stuff.  We've got stuff.  Everyone's got stuff and no one can find it.
We need, as many people have already figured out, a unified metadata searching environment so I find some interesting looking citation, I click, and it points me at a bunch of sources, starting with my own internal holdings.

We're not there, but in the meantime, you can hack it various ways.

ParaCite
is one approach.
Give it some citation metadata, and it will search around for matches.
Very cool.

If you look at the coverage it provides, you can see it tries everything from DOI, through OAIster, and on to Google Scholar.

February 07, 2005

VIEWS metaSEARCH whitepaper now available

Carl Grant of VIEWS and VTLS has been extremely responsive to the concerns I expressed about the VIEWS site.  The metasearch document is now available on their site

VIEWS Metasearch Committee - White Paper January 2005 (Word, 15 pages)

Metasearch, also known as aggregated searching or federated searching, refers to the rather complex process that allows an end user to submit a single query to multiple resources simultaneously and to receive a single processed list of results. Within the metasearch process exist four generally recognized separate tasks:

  • Authentication and authorization
  • Resource discovery
  • Search/retrieve
  • Statistical reporting

A metasearch web service must include the ability to search and to deliver results from that search (retrieve).

You may also want to check out the rest of the VIEWS documents page.

Previously:
2005-02-05 VIEWS - library Web Services

Update: Sorry somehow I had metadata on my mind, it's a paper on meta search.

February 05, 2005

VIEWS - library web services

Hey look, a whole organization with a website and everything.

VIEWS - Vendor Initiative for Enabling Web Services

VIEWS is an industry-wide cooperative effort to leverage libraries’ expertise in understanding, processing, and delivering information with the functional and practical efficiencies delivered through Web Services.

...

A primary goal of VIEWS is to make library services seamlessly available to the larger world of information handling and processing, whether through tools like Google, through use of e-learning products/services, or via information portals in general.

via Library Technology Guides: VIEWS releases Metasearch white paper

The VIEWS group appears to be led by VTLS, "Visionary Technology in Library Solutions".

Virtua, our flagship ILS product, features such visionary technology as FRBR (Functional Requirements for Bibliographic Records), Update Notifications through SDI, User Reviews & Ratings (a peer review feature), even a Smart Device (PDA) interface to the PAC.

The metasearch white paper is buried in their mailing list archive for some reason.  I tried to get it, but their server keeps timing out.  Here's a link I found, maybe you will have better luck

MetasearchWhitePaperFinal.doc

PS Here's a website design tip: If you don't have CONTACT US placed prominently and clearly on your website, it pisses people off.

UPDATE 2005-02-07: My concerns have been addressed.  See newer posting with link to metasearch document.

December 13, 2004

the merits of hackability

What I'm seeing with the search solutions that are coming out is that they are siloed due to branding.  They may search multiple types of information, but they all are centred around the branded service (Google, Yahoo, MSN etc.) without having any plug-in expandability capabilities.

This compares with Apple's approach to federated Internet searching, which is the Sherlock application.
You could make your own Sherlock plugins if you wanted to search additional services.
They have now moved away from that simple idea, and Sherlock development is now based around the idea of making channels.

Watson also had this idea of search that can be extended with plugins.

In Firefox it is easy to make new search plugins, but there's no federated searching across them (that I know of, anyway).  The Mycroft idea (the way you do Firefox search plugins) actually explicitly builds on Apple's Sherlock work.

So my point being, Google/MSN/Jeeves/Copernic/Yahoo/whomever desktop search is all very nice, but where's my completely integrated, federated search everything tool that takes plugins?

November 20, 2004

IL2004 - Monday - 13:15 Federated Searching and OpenURL (Part I)

Originally posted 2004-11-16.

“An Introduction to Federated Searching and OpenURL”
Frank Cervone
Assistant University Librarian for Information Technology
Northwestern University
Evanston, IL, USA

[Oh I see THEY have Internet (the presenters)]

federated search - search multiple resources (in background) - provide unifed results list

using OpenURL to link to full text in results list

“the vast majority of OpenURL is done using URL rather than DOI”

presents OpenURL as a way to link source database to target database, e.g. from FirstSearch get OpenURL, your resolver turns that into a linkt to EBSCO

this is as opposed to having direct, internal links (typically to internal resources) in the source database

OpenURL
- redirect through resolver
- resolver interprets data which may be
-- DOI
-- a URL encoded with metadata about the resource
- locates appropriate copies

they implemented SFX, it took them about 3 months
but Northwestern had “serial solutions” data
another place didn’t have that data and it took them 2 years to key it all in

they prioritized their results based on availability, comprehensiveness, whether you actually get it consistently when you click on it, quality of PDF etc.

you then have on-going maintenance of your OpenURL database, but most of this should be automated

“as nice as OpenURL is, it is just a stop on the destination to federated searching”

what is federated searching
- metasearch
-- uses metadata to make decisions
- megasearch
-- uses full text to make decisions
-- e.g. dogpile, alltheweb

federated search engine - ezproxy (remote auth and acces) - world of elec tronic resources

once you have located the info (using federated searching) then use OpenURL to locate appropriate copy

people overwhelmingly choose a single, common, unified interface

they are building federated search for undergrads
live search
shows # of hits
can look at individual result sets

users (e.g. faculty) can select which databases they want
they have set a limit of 8 databases that you can federate at a time

also have “my resource list” - user’’s last search, favourite resources (databases)
as well “my e-journals list” which shows the journals covered by the fave resources

OpenURL steps...
- linking to providers - get them to enable OpenURL, interfaces etc.

federated search major issues
- time to cnfigure databases and resources
- dedup results / relevance ranking
- defining searchable collections

They are using ExLibris for federated searching.

“The Long and Winding Road: Evolving E-Journal Management and Discovery Tools”
Cindi Trainor
Director, Information Technology
The Libraries of the Claremont Colleges

they use lots of Serials Solutions stuff
they also got RefWorks

doesn’t seem to actually be anything about federated search and OpenURL

she just gave us her ezproxy access

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    Furl Linkblog

    Resources

    Recent Comments

    Referral

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004