I had a chance to look at the academic bookmarking service CiteULike.org and speak with the team working on it. (Disclaimer: Kevin Emamy of CiteULike contacted me to set up this presentation.)
Kevin and Richard Cameron walked me through the site and gave me a sense of their philosophy about it.
Basically the site started three years ago, you can find some info in their FAQ
Richard Cameron wrote CiteULike in November 2004 and ran the service privately. In December 2006 Richard teamed up with Chris Hall, Kevin Emamy and James Caddy to set up Oversity Ltd. to further develop and support CiteULike.
They're getting to a volume of use (1.5 million records posted) where they can start to look at extracting some of the "wisdom of crowds" and that sort of recommendation analysis will be one of their upcoming directions. They already provide anonymised data for download, which I think is an admirable demonstration of information sharing.
They use "plugins" to extract metadata from websites, there are currently 45 plugins, these are parsers that run on the CiteULike site itself to parse the information. As the plugins developer documentation states
Remember that CiteULike doesn't store actual URLs to articles - it tries to store the raw information required to manufacture a link. The reason behind this is that publishers can be quite brutal and insane sometimes about changing the URL structure on their sites. Sometimes (as in the case of Nature), they'll just break existing links to articles without telling anyone. In these situations, it's vital that CiteULike can dynamically produce the new style of URL so that the existing articles from that publisher in the system can still be accessed.
This is a very wise choice. There are some sites that have links to post articles to CiteULike, e.g. Science magazine has "Post to CiteULike" in its Article Tools for individual articles.
Although I didn't mention it, I wonder if COINS or something similar could address metadata parsing issues. Right now we have multiple sites with multiple different parsing approaches, and that is always inelegant to me, wearing my architecture hat.
SIDEBAR: Connotea has (I think) parsers written in Perl. Zotero has an architecture called translators. 2collab has a very small number of parsers. END SIDEBAR
You can create groups, they can either be fully public, or completely private. There are various possible combinations of rights for joining as well as the option to restrict new group users. Once you are a member of a group, you will get a check box in your "Post To" section when you create a new bookmark.
One thing that I find a bit confusing about CiteULike is its system of rating bookmarks by their reading priority. Even more confusingly this is displayed as a number of stars, which makes it look like it is an article rating, when it is actually about article reading. They said because of the very specific and narrow nature of domain and individual research, they don't believe it would be meaningful to have rankings.
That being said, there are group "Recommended" lists, these are constructed by aggregating the reading priorities assigned to articles posted to the group.
A nice and smart thing that they are doing is to put rel=nofollow on all their links, which reduces the attractiveness to spammers. Spammers are something that all social networking sites need a strategy for, particularly ones that are link-centric.
There are some pages with Google Ads, but when I discussed their business model, it sounded to me that they were looking at something similar to LibraryThing for Libraries - providing aggregrated data to third parties for a fee - I can imagine something like the ability for article-centric sites to display "this article has been tagged X, Y, Z on CiteULike", but that is more me speculating than anything specific that they said.
I shared some of my thoughts about APIs and hopefully this will be an area for further discussion.
In doing the some searches as part of writing this posting, I found there is an interesting audio interview with Richard Cameron as part of the Talking with Talis podcast series.
Comments