« Sci Foo Lives On: Nature's web experiments | Main | TV Shows finally come to iTunes Canada »

December 11, 2007

CiteULike academic article bookmarking

I had a chance to look at the academic bookmarking service CiteULike.org and speak with the team working on it.  (Disclaimer: Kevin Emamy of CiteULike contacted me to set up this presentation.)

Kevin and Richard Cameron walked me through the site and gave me a sense of their philosophy about it.

Basically the site started three years ago, you can find some info in their FAQ

Richard Cameron wrote CiteULike in November 2004 and ran the service privately. In December 2006 Richard teamed up with Chris Hall, Kevin Emamy and James Caddy to set up Oversity Ltd. to further develop and support CiteULike.

They're getting to a volume of use (1.5 million records posted) where they can start to look at extracting some of the "wisdom of crowds" and that sort of recommendation analysis will be one of their upcoming directions.  They already provide anonymised data for download, which I think is an admirable demonstration of information sharing.

They use "plugins" to extract metadata from websites, there are currently 45 plugins, these are parsers that run on the CiteULike site itself to parse the information.  As the plugins developer documentation states

Remember that CiteULike doesn't store actual URLs to articles - it tries to store the raw information required to manufacture a link. The reason behind this is that publishers can be quite brutal and insane sometimes about changing the URL structure on their sites. Sometimes (as in the case of Nature), they'll just break existing links to articles without telling anyone. In these situations, it's vital that CiteULike can dynamically produce the new style of URL so that the existing articles from that publisher in the system can still be accessed.

This is a very wise choice.  There are some sites that have links to post articles to CiteULike, e.g. Science magazine has "Post to CiteULike" in its Article Tools for individual articles.

Although I didn't mention it, I wonder if COINS or something similar could address metadata parsing issues.  Right now we have multiple sites with multiple different parsing approaches, and that is always inelegant to me, wearing my architecture hat.

SIDEBAR: Connotea has (I think) parsers written in Perl.  Zotero has an architecture called translators.  2collab has a very small number of parsers.  END SIDEBAR

You can create groups, they can either be fully public, or completely private.  There are various possible combinations of rights for joining as well as the option to restrict new group users.  Once you are a member of a group, you will get a check box in your "Post To" section when you create a new bookmark.

One thing that I find a bit confusing about CiteULike is its system of rating bookmarks by their reading priority.  Even more confusingly this is displayed as a number of stars, which makes it look like it is an article rating, when it is actually about article reading.  They said because of the very specific and narrow nature of domain and individual research, they don't believe it would be meaningful to have rankings.

That being said, there are group "Recommended" lists, these are constructed by aggregating the reading priorities assigned to articles posted to the group.

A nice and smart thing that they are doing is to put rel=nofollow on all their links, which reduces the attractiveness to spammers.  Spammers are something that all social networking sites need a strategy for, particularly ones that are link-centric.

There are some pages with Google Ads, but when I discussed their business model, it sounded to me that they were looking at something similar to LibraryThing for Libraries - providing aggregrated data to third parties for a fee - I can imagine something like the ability for article-centric sites to display "this article has been tagged X, Y, Z on CiteULike", but that is more me speculating than anything specific that they said.

I shared some of my thoughts about APIs and hopefully this will be an area for further discussion.

In doing the some searches as part of writing this posting, I found there is an interesting audio interview with Richard Cameron as part of the Talking with Talis podcast series.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/203481/24124982

Listed below are links to weblogs that reference CiteULike academic article bookmarking:

Comments

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    Furl Linkblog

    Resources

    Recent Comments

    Referral

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004