Posts categorized "Searching"

April 28, 2008

availability, discovery, and delivery - redux

  • Availability does not equal accessibility: researchers’ top concern about scholarly communication is that they cannot access all the content they wish to access
  • Researchers tend to use tried-and-tested discovery tools, or those which their library specifically trains them to use. Google and other web search engines remain the most-used search tools for work-related information. The main problem with discovery is coming up against an access barrier
  • Researchers do not always know how to seek out a freely-available copy of an article that they want and which they have discovered behind a toll barrier

Key concerns within the scholarly communication process: report to the JISC Scholarly Communication Group, March 2008 [Word document]

via Lorcan Dempsey

I find it interesting that the focus of concerns is around delivery, not discovery (perhaps this is how the questions were framed).

I think the academic library faces two challenges:

  1. Ensure that your researchers can always get from their chosen discovery environment easily to
    1. Get to your licensed resources
    2. Get to free copies, if licensed versions aren't available
    3. Get to purchase options, if free copies aren't available
  2. Ensure that as many of your resources as possible can actually be discovered

I'm not convinced that we're doing a particularly good job of addressing these fundamental challenges even after years of working on proxies, federated search, link resolvers, and "live in your environment" plugins and external website settings.

It seems to me that librarians were so focused on trying to control the discovery experience, trying to make people discover resources "properly", following established librarian search protocols, that the simple challenges above were not addressed.

I think we need to spend a lot of time with researchers letting them search however they want, and seeing whether they dead-end either by not being able to get to a resource at all, or by landing at a paywall for a resource we license or can get to for free.  Fix that first.

And I'm not entirely convinced we have all the tools we need to fix that problem right now.

Then once you have addressed consistent delivery, work on improving discovery.  I think discovery is much harder to fix.  And to some extent, there should be vendor pushback.  I don't care how rich or comprehensive a licensed resource is, if my users can never discover it, then the message to the vendor should be "enable easy ways for my user to discover your resources within their preferred discovery environments, or next year we're not licensing your content".

Previously:
Lorcan and I had a bit of a back-and-forth about discovery and delivery in 2006.

August 08, 2006  Discovery and disclosure - Lorcan Dempsey
August 09, 2006  online library role in discovering and delivering - Science Library Pad

April 25, 2008

Yahoo - the SearchMonkey cometh

Well, as long as it isn't a flying monkey, ok.

Enter the Yahoo Open Strategy (YOS). ...

There’s a massive, latent social network within Yahoo, and we’re going to bring it to the surface. We’re making Yahoo more social, but we’re not building yet another social network. We already have an incredible social network… we just need to unlock it.

...

A first taste of our strategy is SearchMonkey, which will let developers mash up helpful data with our search engine results.  ... launch party May 15 [2008]

http://ycorpblog.com/2008/04/24/developer-welcome-mat/
(Exclamations removed because I refuse to put exclamations all over the place, including inside of acronyms.)

SearchMonkey is what I previously described as "semantically-enriched search results".

[SearchMonkey]

General story of Yahoo Open Strategy very widely reported, e.g. Globe - Associated Press - Yahoo plans social makeover.  SearchMonkey bit via Search Engine Watch blog.

March 27, 2008

Building SkyNet for Science - presentation for NISO Discovery Tools Forum

My presentation is available at

http://www.slideshare.net/scilib/building-skynet-for-science-discovering-new-frontiers-using-embedded-knowledge/

A lot of it is conceptual, so you may want to wait until the audio is available from the NISO site (hopefully next week).

UPDATE 2008-03-28: I forgot to mention that all of the supporting links for the presentation (will be) available at http://www.connotea.org/user/scilib/tag/nisodiscovery2008  ENDUPDATE

I thought it went well, although as first speaker up there is a disadvantage of not seeing how other people set up.  I was in a bit of a rush to get started so that I would finish on time, so I didn't do a great job of attaching my mike and I just held the wireless transmitter in my hand.  With the transmitter in one hand and my laser pointer gripped in the other for the entire 50 minutes, it's possible I looked a bit of a prat.

I usually try to remember to keep my hands free for presentations so that I can use more natural body language, anyway lesson learned.

I also forgot to say "The future is not set.  There's no fate but what we make for ourselves." before the last slide.

There are some common themes emerging from the presentations, I'm always amazed when a bunch of people develop presentations in isolation and then they actually all fit together when presented.

I've posted some photos of the presenters in my Flickr under nisodiscovery2008 (my cameraphone can upload directly to Flickr over WiFi), they also show up because of the machine tag linkage on the Upcoming page.  No pics of Chapel Hill yet as I don't have a car and it turns out while we're only about 4km from the town, the most direct route for me to get there I think would be to walk beside a six-lane divided highway, which is not too appealing.

UPDATE 2008-03-28: The carbon offset for my flights from myclimate.org (including the trip to Open Repositories) was about C$118.

March 18, 2008

Google Book Search API

The Google Book Search Book Viewability API enables developers to:

                   
  • Link to Books in Google Book Search using ISBNs, LCCNs, and OCLC numbers
  • Know whether Google Book Search has a specific title and what the viewability of that title is
  • Generate links to a thumbnail of the cover of a book
  • Generate links to an informational page about a book
  • Generate links to a preview of a book

http://code.google.com/apis/books/

via LibraryThing blog - Google Books in LibraryThing - March 13, 2008

We need APIs everywhere for everything.

March 17, 2008

Semantically-enriched search results coming from Yahoo

In an upcoming talk I will be continuing a theme I started at Allen Press, calling for more semantic enrichment of scientific information online (I am of course, only one of many making such calls).

It is therefore timely to see Yahoo offering an open platform for harvesting and returning semantically-enhanced search.

There was a pre-announcement on TechCrunch, followed by the official word on the Yahoo Search Blog

In the coming weeks, we'll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.

Yahoo Search Blog - The Yahoo! Search Open Ecosystem - March 13, 2008

You can sign up for more information at

http://tools.search.yahoo.com/newsearch/open.html

So what would an appropriate set of semantic information be for a scientific article, what would your ideal search display include?  # of citations?  Impact Factor?  Chemical and gene sequences?  Price?  (Sometimes information wants to be expensive...)  How much can we fit into a couple of lines that will help to select one article over another in results?

UPDATE: And Yahoo is just one player in this space, as Paul Miller indicates in his posting Looking for a dominant Semantic Web search engine.

via Twitter mostly

February 20, 2008

Next Generation Discovery - NISO Forum - March 2008

Next Generation Discovery: New Tools, Aging Standards
March 27-28, 2008
Chapel Hill, NC

Discovering scholarly information and data is essential for research and use of the content that the information community is producing and making available. The development of knowledge bases, web systems, repositories, and other sources for this information brings the need for effective discovery -- search-driven discovery and network (or browse) driven discovery -- tools to the forefront. With new tools and systems emerging, however, are standards keeping pace with the next generation of tools?

I will be presenting.
I also (after fighting with Yahoo accounts) managed to make an Upcoming event.

Other confirmed presenters include Peter Murray, Karen Hawkins (talking about scitopia.org), and Eric Schnell & Dave Munger (talking about researchblogging.org and the BPR3 initiative).

I'm proposing a tag: nisodiscovery2008

UPDATE 2008-03-10: And a Twitter Hashtag #ndf08

December 30, 2007

Retrevo - searching for reviews

I know the search engine experts out there can probably list dozens of similar sites, but I liked the execution of this one: Retrevo.

If you search Google, you get a flat information space, which you have to narrow with keywords.  I like Google a lot for bringing relevant results to the top - but if it doesn't know why you're searching, of course it has problems.  In particular, if you search for just a particular part number, it will often list page after page of sites selling that item, with very few links to reviews or blog postings.

Retrevo specifically separates its results out into "Reviews & Articles", "Forums & Blogs", and "Shopping".  Nicely done.  Of course, the fact it ranks me first doesn't hurt...

http://www.retrevo.com/s/dpl700?sub.x=0&sub.y=0&sub=Search

November 05, 2007

Internet Archive 20th Century Search - DLF developers' preconference - Nov 3, 2007

Before the main event there was a preconference (with a fraction of the main attendees) exploring technical challenges and possible collaborations.  There were also a couple brief presentations, here are my raw notes on Kris Carpenter Negulescu, the Director of the Internet Archive Web Group talking about

"20th Century Find" using Amazon S3 & EC2

Internet Archive stats

3.5 PB
1.5 million downloads/day

this project is about providing full-text search of their web archive for the 20th century 1996-2000, ~22TB

NutchWAX = Nutch + Hadoop
focus moving to Nutch with plugins

Amazon S3
Amazon EC2 (beta)

there are now more EC2 node options: small (default), large, extra large 8 times small performance and better I/O ($0.80/cpu hour)

indexing began in October 2006
1996: indexed via 20 EC2 in ~36 hours
1997: 100+ EC2 nodes
1998: 300+ EC2 nodes

1999 was attempted in September 2007 using cluster of ~270 EC2 nodes but halted due to lack of
consistent CPU/IO across nodes.

deployed (alpha) index is 1.35TB in size, no compression, ~600 mill docs

Enhancements
* multiple instances of a page
* improved ranking of results
* handle dimension of time
* easy UI

Why Amazon Web Services
* pay as you go
* simple to provision
* committed to support
* indeal for indexing Web pages, providing offsite storage, reliable hosting
* great platform for experimentation, iteration
* geographically disperse from Internet Archive ?Data Repository?

Cost Effective, budgeted $20k

* Note: fees can add up fast if not vigilant

Working Well

* APIs
* tech support
* S3
* fee structure
* speed of provisioning
* S3 uniformity of nodes

Challenges - S3

Oct 2006 - June 2007

* (internal?) bandwidth availability
* no specific guarantees for data preservation
* issues related to popularity of the service

Fall 2007

* available bandwidth consistent (~4h to move 7.5TB into EC2)

Challenges - EC2

lots of issues Oct 2006 - June 2007
* location of S3 nodes relative to EC2 was a significant factor for large-scale data processing

July 2007 - present

* working well but hitting IO and CPU constraints on small (basic, default) nodes;
however will continue to use these small nodes

Consider Using AWS When

S3
* need cost-effective backup for data
* multi-provider preservation, geographically diverse

EC2
* if you have spiky computing needs (e.g. spikes in demand)
* you have available R&D resources

Will experiment with AWS for crawling and harvesting, starting Jan 2008, Heritrix/AWS.

July 20, 2007

Scirus Topic Pages now online

http://topics.scirus.com/

"curated" pages with various resources on a specific topic, e.g. Evidence-Based Laboratory Medicine

Previously:
June 28, 2007  Elsevier pre-announces Scirus Topic Pages

June 29, 2007

Casey Bisson on Scriblio and OpenLibrary

In the BIGWIG library social software showcase, Casey Bisson writes about his proposal to further enhance Scriblio.  Scriblio is a vision of a modern library web presence that is about finding books, not being a book inventory.

With its partners, the Internet Archive will build an open, structured website that includes tools to provide public and institutional access to library resources around the world. OpenLibrary.org will offer a system of integrated tools that can be used individually or together to meet a library or patron’s needs, that is free to the user and the library. The overall goal of the project is to shorten the distance between initial query and document.

June 28, 2007

Elsevier pre-announces Scirus Topic Pages

Topic Pages are an innovative way for scientists to communicate in an informal and flexible way. Each Topic Page will provide researchers with summaries of a specific topic written by an authority in the particular subject area, with direct links to relevant scholarly papers, abstracts and citations, supplemented with relevant websites and other online resources from Scirus. In the initial phase, authors for Topic Pages are invited through an editorial process facilitated through Elsevier publishing staff. As more pages are developed, additional authoring options will be considered.

...

At the official Topic Page launch later this year, the functionality of the Topic Pages will allow scientists and researchers to alter the content and provide feedback, allowing each topic to be shaped by the suggestions made by the research community.

from Elsevier PR - Elsevier Announces Partnership with FAST to Deliver Topic Pages for Scientific Communication on the Web

This is presumably the social network that was indicated in Joris van Rossum's Science-specific Search - Scirus and beyond talk at IATUL 2007.

UPDATE 2007-07-20:

http://topics.scirus.com/

June 14, 2007

Science-specific Search - Scirus and beyond - Joris van Rossum - June 14 - IATUL 2007

Joris van Rossum
Scirus, Amsterdam, The Netherlands

Science-specific Search: Bridging the gap in dissemination of and access to information

http://www.scirus.com/

Scirus is the free science search engine
he previously worked as a senior product manager at Scopus
(the pay science search engine http://www.scopus.com/ )

Content [Overview]
* How has content provision changed?
* How has information retrieval changed?
* Future trends

Internet has made journal publishing just one of many options for scientific content communication

why is there still journal publishing?
* stamp of authority
* versioning challenge - journal has version of record
* archiving - publishers ensure the article will always be available online

Increasing amount of content available online
* high amount of published content
- Scopus has 30 million abstracts
- ScienceDirect has 8 million articles
* amount of scholarly web content even higher
- Scirus currently indexes over 400 million scholarly web pages
(feel they need to add another 400 million on top of that in order to be complete at this moment)
* size of general web has exploded
- as of August 2005 Yahoo indexed over 19 billion pages
- Google says it indexes 3x more than that

Different content discovery methods
* browsing
* linking
* alerting
* searching
* user collaboration/sharing (potentially very strong, even replacing searching)

based on analysis from ScienceDirect usage logs

Browsing
* used to keep abreast of latest developmetns in subject area
* 31% of all full text article use on ScienceDirect is a result of journal browsing
* users that start this way download on average 1.9 articles

Linking
* very effective content discovery method
* publishers are collaborating through CrossRef to ensure correct reference linking
* 8% of all full text article use on ScienceDirect comes from reference linking
- Same is expect from 'cited-by' links
* next to reference and cited-by links in official literature there is
- web ref and cite-by
- patent ref and cite-by
- clustering
- author linking

citation paradigm applies beyond the official literature

Alerting
* journal issue [TOC] (RSS)
* top articles
* citation
* search

2-4% of downloads come from alerting

Search is the main driver of journal article use
* exponential growth of PubMed and ScienceDirect searches
- growth rates between 20-110% from 2001-2005
* search has overtaken browsing
[missed rest of slide]

search is important because it yields more than just journal results

* general web search (Google) often 1st choice for scientists (66%) and physicians (55%)

* subject-specific search platforms remain important
- avg. # of full text article downloads per sesson from PubMed is 3, from general search is 1.5

screenshot of Scirus, specialised science features

role of librarians in improving integration:
- search on library homepage, OpenURL (Scirus Library Partners)

*** User Collaboration and Sharing - The Future of Information Sharing ***

Combining browsing, linking, alerting and search in a community and network-driven system

... Scirus will launch a new community service in a couple weeks

Q: researchers aren't always searching for articles - in computer science we are interested in
aggregates e.g. projects and research programs
A: the new service will offer topic-based collected search and resources relevant to a particular area

Q (Jens): journal publishing enduring... authority good - versions ? - archival not true!
A:

archiving ScienceDirect - The Hague - for free - even if Elsevier disappears, info will still be available

why is Elsevier concerned about archiving
- helps provide confidence to move everyone to e-only
- gives authors confidence

May 27, 2007

meta: in which I make a claim

Just a posting to try yet again to claim the 2nd URL that Technorati sees for my blog

Technorati Profile

In other news, Technorati now presents a single search box, to search for tags specifically you'll have to go to advanced search.

UPDATE: It's working for both TypePad blog URLs now.  An interesting reminder about basing any decisions on a particular Technorati ranking result.

Previously:
July 6, 2006  Technorati rank, updating, and multiple blog URLs

May 24, 2007

Amaznode related book visualization

Amaznode uses Flash and the Amazon API to extract and visualize related books for a search.

See e.g.

paris

and

gps

Note: it will continue pulling in results for a while (about a minute).

Via BlogSchmog.

May 17, 2007

library conferences on a map

It seems to me it would take such a thin layer of widely-used microformats to be able to plot library conferences on a timeline and on a map, but for now the reality is conferences tend to be listed on static, hand-created web pages, all with widely-differing formats, and every single conference has its own static, custom web site.

It's therefore fairly remarkable that Google is able to make anything at all out of this mess, in order to plot a few library conferences on a US map

From Google experimental search

library conferences view:map

http://www.google.com/views?q=library+conferences+view%3Amap

April 05, 2007

Google MyMaps

Tonight Google launched MyMaps. It adds the ability to create and share maps directly from Google's site .... These maps will be added to Google geoindex and will be available to search in Google Earth and in Local Search. These maps will be syndicated via KML; support for GeoRSS syndication is coming (not surprising given that GeoRSS can be consumed by these maps - Radar post). Users are able to create their own maps and mark them public or private. The annotation tools that are provided are very simple and easy to use.

O'Reilly Radar - Google Launches MyMaps

UPDATE 2007-04-17: From my perspective, this is a fairly thin layer of additional functionality.  I do all my placemarks in Google Earth - I don't see any advantage to doing them in My Maps.  It's maybe not widely known, but Google Maps could already read Google Earth KMZ files - just enter the URL of the .kmz in the search box of Google Maps. 

There doesn't seem to be any way to directly import a KMZ in My Maps, but you can import placemarks one-by-one using "Save to My Maps".

April 04, 2007

Google ponders health information

Quite a long posting in the Google blog pondering the most useful types of health information to provide:

How do you know you're getting the best care possible? - March 28, 2007

When I talk to people using Google to search for information about their health questions and how well search answers these questions, I hear several common concerns. I want to list them and discuss our thoughts about them.

They have a contact address for this topic: health@google.com

Google Desktop Search for Mac

The main feature I'm looking for is the ability to prevent indexing external drives - I don't want my desktop search going crazy every time I plug in an external 200+ GB drive.  Spotlight doesn't seem to remember this correctly, but reportedly Google Desktop does.

http://desktop.google.com/mac/

Review from ArsTechnica - Hands on with the new Google Desktop for Mac.

via Slashdot Google Desktop for Mac Released /.

April 01, 2007

the Amazon embeddable search widget

It's interesting that, at the same time as libraries are worrying about losing traffic to external websites, Amazon is building search widgets that embed the search results in the local page

http://typepadwidgets.amazon.com/

If you type a search into the widget below, it will return the results within the widget, rather than forcing you to go to Amazon.  To me, this is a visible demonstration of the possibilities provided by Web Services and SOA.  The only way you can have your data show up on a remote site is by making those data visible through a well-defined external interface.  Exposed services enable external embedding and visibility.  We have seen this previously with the Amazon aStore, and now with this TypePad widget.

I have not chosen the book list below; I think Amazon is reading either post- or site-specific information (perhaps parsing the entire posting for contextual clues) in order to determine what books to display it looks like Amazon is using its Omakase technology, which recommends books based on your Amazon purchase and browsing patterns (as linked by an Amazon cookie).

Note: The widget may take a moment to appear, and it probably won't show up in a feed reader at all, due to the JavaScript libraries it needs from the web page.

   

February 23, 2007

in search of Autumn - a real-world quest

Anyone who has obsessively searched for a piece of information will appreciate this tale from Vanity Fair, about the search for the location of a treasured desktop wallpaper

Autumn and the Plot Against Me

via Digg

February 19, 2007

German scientific library services

Because how often do I get to use "Deutsche Forschungsgemeinschaft" in a sentence?

DFG - Scientific Library Services and Information Systems: Funding Priorities Through 2015 (PDF)

9 pages, English.

I think Lorcan will like this:

Today's libraries, archives and other specialised information services operate largely independently of one another.  These different institutions must integrate into a coherent nationwide network for the provision of digital information for science and the humanities.  By creating a digital environment within universities and research institutes in which digital channels become the standard medium for accessing, analysing and publishing research data and scientific results, libraries can become the cornerstone of e-science.

Another part that jumped out at me was about their scientific information portal

The creation of Vascoda, the nucleus of a "German Digital Library," sponsored mutually by libraries and specialised information services, represented an important building block in the development of an integrated system of national information provision.

via In Between

(In case you're wondering

The Deutsche Forschungsgemeinschaft [DFG] (German Research Foundation) is the central, self-governing research funding organisation that promotes research at universities and other publicly financed research institutions in Germany.

)

February 03, 2007

Google Scholar as good as the library?

This is the conclusion of a research study to be published in the next issue of Internet and Higher Education. I've emailed the authors to see if they're planning on depositing the paper in their Institutional Repository...  It's only a pre-print right now so I can't give you a complete citation, but if you subscribe to Science Direct you may be able to access the article here.

from The Distant Librarian

UPDATE 2007-02-06: You may have better luck with the DOI link

doi:10.1016/j.iheduc.2006.10.002

http://dx.doi.org/10.1016/j.iheduc.2006.10.002

The study is quite small (N=27) and the number of citations compared is also (N=72).  My summary would be "students can get good (trustworthy, high quality) citations by searching through their library or through Google Scholar, but if they search only Google they often get junk and don't realize it".  The specific metrics they used for quality were authority & reputation, objectivity, academic rigor, and transparency.  While these criteria are relevant for academic work, they may be less so for the general public.

December 30, 2006

Tapscott: turn your organization inside-out

Ok, the way he puts it is

Part 5: Platforms for Innovation

This is a very important, key concept.
In fact, let me say it again, with colour and size:

This is a very important, key concept.

This is absolutely foundational, in fact, to understand a lot of the issues that I talk about.

The core is as simple as simple can be: you get more from sharing than from secrecy.
But you have to understand the context of "secrecy".  In many cases, anything that is not published on the public Internet, might as well be in a locked box on a high shelf.  In fact, you might as well print it out, send it to the new Library of Alexandria, and burn down the new library.

The benefits of sharing are so enormous, and the extra overhead so tiny, that I have never quite understood why people don't post more information publically on the net.  My thinking has always been "if I do this for myself, maybe there is someone else out there it might be useful for".  Considering that a page I did for myself as a firewall sysadmin gets over a million hits per year, it appears this concept is valid.

During the time of the Clinton Administration, as reported (in passing) in The River at the Center of the World by Simon Winchester, they turned classification inside out.  Instead of saying "prove this should be declassified", they said "prove this should be secret".  That led to the author of the book getting a map of China that, I suspect, he would no longer be able to get today.

This is the challenge for your organization.  Don't ask yourself "what's the justification for sharing this information on the net".  Ask "can anyone strongly justify NOT sharing this on the net".

I see the most ridiculous stuff go on.  Librarians (and many others) take written notes at conferences, then labouriously transcribe them to a written report, which is then emailed to like two other people and immediately forgotten/lost.

People don't have crystal balls to see within your organization or within your mind.  If you don't share your information, you might as well be invisible.  How's that for a slogan?  "Share transparently or be Internet invisible".  Additionally, these are not the days of late-1994, early-1995 when I first put up a website.  Then an interesting effort was highly visible, because there were so few pages.

There are now billions of pages.  The good news is, you can still rise out of those "foothills".  The bad news is, if you have a search surface, an acreage (hectarage?) if you will, that is tiny, you have very little chance of being noticed.  If your land isn't continually being expanded and improved, in fact, it will probably get little (search engine) notice.  So here's another concept: "Sharing increases your search surface".  In security, we talk about making your "attack surface" or "vulnerability surface" as small as possible.  Internet presence is the opposite: you want to make your Internet surface as big as possible.

I know this is true, because my website and my blog bring me incredible contacts and opportunities that I could never have imagined, despite the fact that both Internet surfaces are often little more than "stuff that I wanted to research and think about anyway, which I'm sharing because there's no reason not to, and someone else may have similar interests".

Anyway, this posting has grown very long, but that is because it is of such critical importance.  Think beyond container to content.  Think beyond content to sharing content.  That is where everything is going to be happening.  That is going to be the new expectation.  If you've seen the controversy in Toronto and Ottawa about private municipal contracts and secret city meetings, that thinking is mired in the past.  Citizens, young and old, are going to be demanding the end to ridiculous, needless secrecy, from their governments and from all other organizations that they deal with.

So what is library Service-Oriented Architecture about?  It is about no more (and no less) than helping libraries collectively to build an open, shared platform, that enables sharing of containers (e.g. books) as well as sharing of content (e.g. articles).

Here's how Tapscott puts it

A growing number of smart companies are learning that openness is a force for growth and competitiveness. As long as you're smart about how and when, you can blow open the windows and unlock the doors to build vast business ecosystems on top of what we call "platforms for innovation."

Because I am a prideful creature, I must say I wrote this entire posting after having read only the first few sentences of Tapscott's article today, it is therefore heartening synchronicity to read

Jeff Barr, who runs Amazon's Web services program says developers and marketplace sellers are "increasing the surface area of Amazon." They add more and more things to sell, in more and more places on the Web. All of this happens in a completely self-organizing fashion, which makes Amazon's already low overhead even lower.

So let's wrap it up with a final combination: library SOA will increase the Internet surface of all participating libraries.

December 18, 2006

are you #1?

Wise thoughts from Library Geek Woes

how realistic is it to try to be the #1 stop, virtually, for our patrons?

It's not. The world(s) of our customers does/do not revolve around us, no matter how much we might wish it so. Online, the competition is even more mind-boggling than offline. Libraries have been slow to realize that they can't compete with Google (Really. They can't. Let's move on.). Now some libraries believe that they can compete with the likes of Yahoo! as an Internet portal, or create online communities to rival those of Digg or MySpace. This is a result of misguided thinking and not understanding the current, crowded market. New social tools and communities are born every day (check out Mashable), and they have better programmers and bigger budgets and they still can't compete with the bigger services.

What I find amusing/sad is the obsession with Google and "beating" Google.

Next time you use Google, measure the amount of time you're on their site, before you've clicked away to the page that interests you.  5 seconds?  Less?  This is what we're concerned about, 5 seconds of someone's attention at the beginning of a search?  Wouldn't you rather be a high-ranked, useful search destination, or link to your content from other high-ranked search destinations (e.g. Amazon)?

I don't have a problem with Google.  Google sends me thousands upon thousands of visitors per year.

The library problem with Google is that libraries know jack about getting good rankings.  And no, you don't need SEO bulls--t to get good rank, you need lots of good content, frequently updated, that's easy for Google to index.

It's ridiculous that I, writing my little blog, outrank my entire 300-person organization in a search for canada library science, but I do.  That's not a failure of Google... hmm, who does that leave...

#9 Science Library Pad: Canada library building in Second Life
Science Library Pad. Thoughts on the use of technology and other issues for ...

#10 Canada Institute for Scientific and Technical Information
CISTI, the Canada Institute for Scientific and Technical Information, is a science library and a world leader in document delivery for all areas of science, ...

LGW link via LIS News

December 15, 2006

IT directions for National Library of Australia

Libraries, IT and Everything
Mark Corbould
Assistant Director-General
Information Technology
National Library of Australia (NLA)

[IMG_0347-2030347]

presented at Library and Archives Canada (LAC)
December 15, 2006

presentation notes by me, Richard Akerman

- strategic vision of the national library of australia

- function: basically, maintain a national collection and make it available

- 479 staff
- over 550,000 [physical, presumably] visitors
- 110 million page views on website, up 59%

~ 9 million items (including 2 million manuscripts and 2 million serials)

Access

- 365,000 physical items delivered
- 45,000 reference enquiries (4,000 via Ask Now)
- 13 million digital objects delivered

~80 TB of digital object storage, including Australian Web Domain Harvest
- main stuff digitized is unique cultural heritage
- 40% of digital collection is oral history

Web Site Usage

- nice chunk on PictureAustralia, new chunk on MusicAustralia

- IT Strategic Plan (3 years, annual update)

Strategic Directions [phase] 1

* for a long time it was build, describe, preserve and provide access to the physical collection

Strategic Directions [phase] 2 [in addition to phase 1]

* since 2000 "to provide rapid and easy access..."
- outcome: online federated discovery services with "no dead ends"

Looking Outwards

* Internet-enabled collaboration
- lots of freely available content
- organizations are making their content available for remixing and repurposing
- low barries to participation
- light-weight trust models, no MOUs
- sharing personal views of contemporary events
- creating social networks around areas of interest
- near enough is good enough
- loss of control
- need to be less risk averse

Strategic Directions [phase] 3 [in addition to previous]

* In 2006, "to enhance learning and knowledge creation by further simplifying and ... services..."
[slide switched before I got the rest]

Key IT Outcomes

1 ensure collecting record of Australia
2 To meet user need for rapid
5 Ensure relevance

[yes, he skipped a couple]

IT Goals

many including
* Develop and deploy new full-text search software
* Provide online spaces to support publishing, collaboration, contribution and interaction

The rest of the world wants to find stuff in Google etc.
"we've got to get our data out there"
worked with Google to get their books in Google Scholar

Get this item

* Bookshops
* Suppliers

Also A9.com - Libraries Australia is searchable by using OpenSearch lightweight protocol

working to do federated search across museums and other organizations, using OpenSearch

Libraries Australia [national union catalogue I think]

* considered replacing individual library catalogues (as a starting point) with
"Libraries Australia, as the  primary database to be searched by users"

to do this you need to be able to integrate well with all the OPACs, which of course is a problem

Expanding Borrowing

* Wake-up calls: statistics and commentary
- [Lorcan] Dempsey "Materials are not being united with users who want them" [not sure of quote,
switched to next slide]
[You can see the entire quote and more info in Kent Fitch's presentation A New paradigm for “getting” (PowerPoint) from Libraries Australia Forum 2006 (LAF06)]
- Long Tail argument

Current Fulfillment
- charge $13 for ILL, total cost actually $49

steady decline in ILL

ILL: Strong disincentives to participate
- Expensive
- Slow
- Loss of control of assets
- Inconvenient / impossible

Fulfillment

Making "Search, find, get" seamless

* Lend directly from library to reader - NetFlix model - "NetBooks"

Also get your presence into e.g. Amazon.com "Borrow this book from Libraries Australia" (using greasemonkey)

PictureAustralia
http://www.pictureaustralia.org/

includes pics from Flickr
about 6000 images harvested


  "All you need in Aussie" - by Alicia Zappier 
  Originally uploaded by desertgirl.

MusicAustralia

* bought metadata catalogue of australian contemporary music - will interface to ecommerce gateway

Model

* "metadata discovery should be free, but you may have to pay for fulfillment"
* or I would say "easy access is more important than zero cost"

AustraliaDancing

* "Take Part" - customized wiki

There's more...

* People Australia, based on using authority file
* Open Access Journals - "really cheap and easy, blurs between publishing and collecting"

Issues

* Sustaining existing services
* Managing expectations
* Supporting innovation
* Enabling rapid prototyping
* Being vibrant and relevant

Ok, but how

* Reorg IT

* Review IT Architecture
- Service Oriented [emphasis mine]
- Consolidate metadata repositories
- "single business" model

* Create an IT-aware organization
- Communicate, collaborate and train

42 IT staff, over 400 library staff

Additional tech notes
- using Lucene full-text search
- using Confluence - new IT architecture is on a wiki
- training business analysts in BPM

Q: How does IT decide on priorities for projects?
A: Through the operational plan, reviewed by Corporate IT Group that sets the priorities

If necessary, escalate to corporate management.

Q: Web Harvest?  How made accessible?  Federated discovery - what challenges?
A: Paid the Internet Archive.  Ran for 6 weeks.  About 180 million files.  On a PetaBox, installed in
Australia.  Internal access but not public.  Also the Internet Archive will put it up.

Permissions/legal issues within Australia.

Issues with robots.txt - if you follow robots, you may not get inline images or CSS.

Federate search is basically around OpenSearch, plus may need to add relevance ranking.

Q (me): How are you doing relevance ranking?
A: Currently Teratext

but going to use Lucene.

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    Furl Linkblog

    Resources

    Recent Comments

    Referral

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004