« where in the world are users generating? | Main | microformats links »

March 29, 2008

The Two Laws of Robotic Librarians

In my NISO presentation I proposed a couple new library laws.
For some background, here's some info from Wikipedia

Ranganathan's Five Laws of Library Science (1931)

  1. Books are for use.
  2. Every reader his [or her] book.
  3. Every book its reader.
  4. Save the time of the reader.
  5. The library is a growing organism.

In 2004, librarian Alireza Noruzi recommended applying Ranganathan's laws to the web in his paper, "Application of Ranganathan's Laws to the Web":

  1. Web resources are for use.
  2. Every user has his or her web resource.
  3. Every web resource its user.
  4. Save the time of the user.
  5. The Web is a growing organism.

I propose Two Laws of Library Science... for Machines

  • Every web resource its machine reader.
  • Save the time of the machine.

By this I mean, our web resources need to be not just readable by humans (the presentation layer), they need to be readable by the machines, who have a hard time understanding presentation and natural language.  This may mean that the machine does some screen-scraping tricks, but that's fragile and time-consuming for the machine.  While you may not think saving the machine's time is an issue, there are two points: firstly, as the content on the web grows, we want it to be parsed by machines as quickly as possible, so that we get immediate discovery of new information; secondly, code running locally on a laptop or in particular a mobile phone/PDA may have limited compute and memory resources, and you may want that code to be able to alter web pages with additional discovery information fast enough so that there is no delay noticed by a user.

Now I have no Semantic Web illusions that people are going to nobly go back and markup all their content with semantic information, that vision is a fantasy that lingers with us from the SGML days and it's never going to happen.

Ralph LeVan took me to task, saying developers are not going to do extra work, the work is only done if there is a business case, and that developers are tasked with presentation GUIs for users, not with enriching web pages in invisible ways.

Well, yes and no.  People will do new things and extra work when they have a compelling motivation.  There may be many different motivations.  The system to register posts and get markup from ResearchBlogging is rather elaborate, but people do it because they want to be discovered.  Even a slight advantage in discovery can be a huge motivator to people.  That's why I think the Yahoo Open semantic initiative will bring a huge push for microformats.  And of course, it will never be people manually adding microformats in a big way anyway.  It will be our creation applications and tools that automatically insert microformats as appropriate.  Programmatically grabbing a DOI and inserting a visible citation is not a huge amount of effort... extending this to embed the citation as COINS is a miniscule additional step.

And of course there will be people running very sophisticated algorithms on big networks of computers with loads of storage, to data mine out useful semantics, in particular about "science objects" like formulas, genes, chemicals etc. and then insert the proper microformats and identifiers for much simpler applications and machines to read.

Here's what Tim Berners-Lee had to say on the topic (from transcription of podcast done with Paul Miller)

Paul Miller: ... Another area that will require a huge amount of effort moving forward is around data for the Semantic Web. We're going to need an awful lot of it. Where are we going to get it from?

Tim Berners-Lee: There's an awful lot of data out there. And I think, one of the huge misunderstandings about the Semantic Web is, "oh, the Semantic Web is going to involve us all going to out HTML pages and marking them up to put semantics in them." Now, there's an important thread there, but to my mind, it's actually a very minor part of it. Because I'm not going to hold my breath while other people put semantics in by hand.

I'm not going to wait for other people to do it, and I don't want to do it either, to sort of add the semantics to HTML pages. So, where is the data going to come from? It's already there. It's in databases. So, most of this data is in databases. Often the data is already available through some kind of a Web interface.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c8a6453ef00e551808ba98833

Listed below are links to weblogs that reference The Two Laws of Robotic Librarians:

Comments

The comments to this entry are closed.

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004