Posts categorized "Data Management"

June 04, 2009

UK open data, open government

I was sorely tempted to title this "would uk like some data, guv?"

The UK government is picking up the challenges issued in the excellent Power of Information Taskforce report.

Via Andy Powell in my FriendFeed, I find a Guardian article Free our data: UK set to follow successful US data method

Now the UK government has picked up on the idea, and in a post on the Cabinet Office blog Richard Stirling is asking the British public how a UK version of the US site should be implemented. "What characteristics would be most useful to you - feeds (ATOM or RSS) or bulk download by FTP?," he asks. "Should this be an index or a repository? Should this serve particular types of data eg XML, JSON or RDF?"

Although there is a list of dozens of the UK government's published data sources there is no clear pan-governmental approach to making data available. The proposal has been received with pleasure by a number of web developers and would-be data users, although it is not clear how free people would be to use the data commercially.

Richard Stirling is writing in the UK Cabinet Office Digital Engagement blog

http://blogs.cabinetoffice.gov.uk/digitalengagement/

At some point I will no longer be saying things like "yes, that's an official gov.uk blog" but... well, it is.

The four themes they list on their about page: open information, open feedback, open conversation, open innovation.

A more extensive extract of what Richard Stirling asks in his posting Information and how to make it useful :

Any solution must support open standards and would ideally be open source, but there are a couple of other questions we are pondering at the moment:

  • What characteristics would be most useful to you – feeds (ATOM or RSS) or bulk download by e.g. FTP, etc?
  • Should this be an index or a repository?
  • Should this serve particular types of data e.g. XML, JSON or RDF?
  • What examples should we be looking at (beyond data.gov e.g.http://ideas.welcomebackstage.com/data)?
  • Does this need its own domain, or should it sit on an existing supersite (e.g. http://direct.gov.uk)

There are already 19 substantive comments, and he indicates they are also monitoring Twitter for the hashtags #poit (Power of Information Taskforce) and #opendata

There is a new Director of Digital Engagement, Andrew Stott, according to his official Twitter feed, @DirDigEng , he was scheduled to start in his position yesterday.

Sometimes I feel like a certain country often considered to be between the UK and the US is missing out on this whole official open data, blogging, twitter thing...

If anyone were to want someone to start blogging officially about government open data in a certain northern neighbour of the US, I am available...

May 21, 2009

data.gov, Whitehouse open gov, Rewired UK Parliament and whither Canada

Data.gov is live.

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data.

But there's more.  Much, much more.

In the Whitehouse (yes, The Whitehouse) Open Government blog http://www.whitehouse.gov/open/blog/ they have announced a national consultation on open government.  Note the very aggressive timeline.

Today we are kicking off an unprecedented process for public engagement in policymaking on the White House website. In a sea change from conventional practice, we are not asking for comments on an already-finished set of draft recommendations, but are seeking fresh ideas from you early in the process of creating recommendations. We will carefully consider your comments, suggestions, and proposals.

Here’s how the public engagement process will work. It will take place in 3 phases: Brainstorming, Discussion, and Drafting.

Beginning today, we will have a brainstorming session for suggesting ideas for the open government recommendations. You can vote on suggested ideas or add your own.

Then on June 3rd, the most compelling ideas from the brainstorming will be fleshed out on a weblog in a discussion phase. On June 15th, we will invite you to use a wiki to draft recommendations in collaborative fashion.

These three phases will build upon one another and inform the crafting of recommendations on open government.

There are more details from the Collaboration Project

Today, the National Academy of Public Administration, in partnership with the White House Office of Science and Technology Policy, has launched an Open Government Dialogue to solicit ideas from the public on how the government can become more transparent, participatory, and collaborative. This online brainstorming session which is now open through 1pm on Thursday, May 28th, will enable the White House to hear your most important ideas relating to open government, including innovative approaches to policy, specific project suggestions, government-wide or agency-specific instructions, and any relevant examples and stories relating to law, policy, technology, culture, or practice.

We would like to ask you to participate by doing the following:

1. Go to http://opengov.ideascale.com/ to participate in the dialogue, and

2. Follow @ogovbrainstorm on Twitter to keep up with the highest rated ideas.

They are hashtagging things #ogov

But wait, there's more...
Sunlight Labs has announced Apps for America 2

I'm pleased to wave the green flag on Apps for America 2: The Data.gov Challenge. This is a development and visualization challenge to see who can come up with the best application and visualization for data from Data.gov. These are exciting times for us-- the walls between Government and Developers are starting to shrink, and we here in Sunlight Labs are terribly excited to get to work on doing great things with the data that's coming out. Government has made a move in the right direction-- now it is time for us to show them what we can do.

and in the UK at a grassroots level the Rewired State project has announced Rewired Parliament is coming up, along with many hacking events.

above info from Twitter - edsu, ideascale, citymark and rewiredstate

There are no comparable Canadian federal initiatives.

Nothing even close. The only thing even remotely along these lines is the internal-only IT Innovation Campaign.  (Which, don't get me wrong, is an amazing development - just not on the scale and without the public visibility and engagement of the Obama administration's initiatives.)

There are some municipal level activities, such as the recent Vancouver open city announcement (see it on YouTube being read into the record), but nothing on a national scale.

My tiny little group is trying to bring up a site with some data about the budget (StimulusWatch.ca), but just that is a huge challenge.

We need Canadian open government leaders at all levels of government.  We can do some from the grassroots (as we just demonstrated at ChangeCamp Ottawa 2009), but we need our political leaders to embrace this vision.  Even if you don't care about the tech bits, here's the takeaway: opening up government and government data will create tremendous opportunities for technological innovation and efficiency, and increase the wealth and competitiveness of Canada.

Previously:
March 5, 2009  data.gov is coming - Vivek Kundra named US Federal CIO

May 19, 2009

a call to journalists to do serious reporting on content copying

Here is some Serious Business Reporting from the Globe

About 32 per cent of the computer software in Canada is pirated, contributing to losses of $1.2-billion (U.S.) in 2008 alone, according to a report from the Business Software Alliance (BSA). If Canada were to crack down and get its piracy rate to around 23 per cent – close to the U.S. rate of 20 per cent – it could result in 5,200 new jobs and contribute $2.7-billion to the country’s economy by 2011, according to a 2008 report from market research firm IDC.

Globe and Mail - Download Decade - New media, old rules - May 19, 2009

Wow: jobs, billions, Business Alliances, all very professional.
There's only one problem.  Those numbers are sh-t.  The RIAA, MPAA, BSA, and CRIA all report dire numbers for piracy and billions lost, numbers which they obtain by... making them up.

Seriously, they just pull them out of thin air.  I might as well say eliminating software piracy would cause Canada to have more pleasant summers and would increase the bison herd.  There's just as much evidence for that.

It is in the interest of industry associations to make digital content copying - which I might add is impossible to prevent technologically - to make this out to be some giant economy crushing disaster.  In the face of counter-evidence (tens of millions in ticket sales for the latest Star Trek, for example) their argument is that some imaginary amount of MORE money would be made, if not for the dread piracy gap.  This is complete, total, unscientific, evidence-free public relations nonsense.

I call on all serious journalists to follow the trail of these numbers from the industry advocacy organisations, do some investigation, and I guarantee you will find that they come from nowhere, they're simply made up.

Stop reporting these numbers as facts.  They're not.  They're basically idle speculation dressed up with scary numbers.

There are lots of real issues to report on, for example, incredibly complex international rights agreements that mean Canada almost always gets online content access later than and to a lesser extent than the United States, and archaic Crown Copyright and cost-recovery approaches in government that mean Canadians can't access their own digital data, paid for with public funds.

May 16, 2009

Google and the web of structured data

Google has announced it will be using (some) microformats and RDFa to enrich search results.  They call this Rich Snippets.

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages. In most cases, it's as quick as wrapping the existing data on your web pages with some additional tags.

Official Google Webmaster Blog - Introducing Rich Snippets - May 12, 2009

O'Reilly Radar says

Moving toward the Semantic Web will allow our searching technologies to become more intelligent and will set the stage for the next revolution in which computing systems can become more aware of the "meaningfulness of data".

We've already seen a shift toward "semantic search": Google has already been augmenting search results with Google Maps, limited catalog searches, and more recent entries into the search market such as Amazon's A9 and the yet to be released Wolfram Alpha differentiate themselves by the structured data and content that can be extracted from a search result. We have yet to a see a compelling reason for web masters to place RDFa or microformats into a site to enable this semantic data to be mined until today, until Google provided a social incentive for site designers.

O'Reilly Radar - Google Announces Support for Microformats and RDFa - May 12, 2009

Google has a help file to get you started: Marking up structured data.

Incidentally, if you're thinking, "why didn't someone tell me this structured data thing was coming?" I should mention that actually I did try to tell people, whether it was in my presentation to Allen Press in 2007 (where I talked about the need for microformats and semantic enrichment) or in my keynote to NISO Discovery last year (where I talked specifically about Yahoo SearchMonkey using semantic information).

Previously:
September 8, 2008  semantic search thoughts

Wolfram Alpha and the web of structured data

The good news is, WolframAlpha can understand a query like "unemployment in Ottawa".
[wolframalpha-ott-unemploy]
The bad news is, as you can see in the Result, that while it knows what data it needs to compute the answer, "(data not available)".

This is as clear an example of why we need open data as I can think of.  If you want your computers to start providing you with smarter answers, you have to give them better information to work with.

WolframAlpha provides two channels to try to address this problem: Contribute Structured Data and Suggest Data Sources.  You can also contribute individual facts, but that won't scale very well.

May 15, 2009

the Open City - the next driver for innovation?

Mayor Gregor Robertson and Coun. Andrea Reimer want the City of Vancouver to support open-source software and open standards.

They also want the city to make as much data as possible freely available to the public. Reimer will introduce a motion [PDF] next Tuesday (May 19) that would see the city endorse the principles of open source, open standards, and open data, as well as start work on publishing data on the Web using open standards.

In a press release issued today (May 14), Robertson said that an “open city” philosophy would help create new opportunities in the information-technology sector.

City of Vancouver set to back open source, open standards, open data - straight.com - May 14, 2009

via Twitter - Rob Giggey (Rob works at the City of Ottawa) - May 15, 2009

(I tried to find the Gregor Robertson press release referenced above, but I haven't been successful - can anyone point me to it?)

Toronto Mayor Miller has also announced toronto.ca/open (which still shows "under construction").

In Ottawa, supported by some City of Ottawa staff but not (yet?) endorsed as any kind of official policy, we're starting the open data discussion as part of ChangeCamp Ottawa.

What can you do as citizens and what can we do as libraries to enable the sharing of our civic data?  Is sharing civic data a next logical step for public libraries as enablers of the public space?

Previously:
May 5, 2009  Web APIs explained on CBC Spark - and Open Data Under Construction

May 05, 2009

Web APIs explained on CBC Spark - and Open Data Under Construction

(I almost wrote Spark CBC, since that's their Twitter name.)

Spark Episode 76 (audio link available directly in the post, as well as various podcast options)

At 22:02 or so in, they take on the challenge of explaining web APIs, or more specificially, they ask Jer Thorp to help walk them through the concept.  It's always interesting to hear the descriptions people use.  For example, I would generally say "machine-to-machine", which is probably way too abstract.  I also tend(ed) to describe APIs in the context of Service-Oriented Architecture, which probably confused the issue (and the audience).  I don't generally talk about computer programs communicating with other computer programs.

I think in general what's presented on the show is a pretty good explanation: websites are opening up their information using APIs, so they can leverage open innovation - outside developers.  We are a long long way from a completely interoperable web of standard APIs though.

Here's the Twitter-sized explanation I had proposed (taking quite a lot of my space to talk about how there wasn't enough space):

Slp-qr-small-trim_normal
scilib: @sparkcbc I don't think that will fit in a tweet. Basically standard interfaces (APIs) allow data to flow between sites, which = mashups.

I would argue as well that web development has gotten sophisticated enough that, while APIs are ideal (at least if well constructed), you can actually get a lot by opening your data, which is the key first step.  Open data enables mashups, APIs just make mashups easier.  Open data means sharing the information your organisation has, out on the web - ideally your default becomes to share.

We're still in early days of open data.  The Guardian calls their approach "Data Store - Facts You Can Use".  I've written previously about the US Data.gov initiative, which currently has the world's simplest website (a giant box reading "coming soon"), but I think is supposed to launch this month.  It's similarly challenging to point to open data cities, because while the Twitter-enabled Toronto @MayorMiller announced toronto.ca/open at Mesh, it also reads simply "Under Construction".

What will be possible is mashups, visualizations, APIs, analysis and much more.

I believe the long term success of projects like StimulusWatch Canada and ChangeCamp Ottawa will depend on open data, and (eventually) on all levels of government having open APIs as well.

Which circles me around to the opening topic of the podcast, about whether online activism ("slacktivism") can actually translate into meaningful real-world activity.  The answer, I think, is tied in with the segment about lurking... the web is mostly lurk, only maybe 10% participate.  Some tiny fraction of those online participants might translate into offline actions.  Maybe one in a thousand?  But nevertheless, it does happen.

While I generally refuse to join these "click your support" Facebook groups (in part because I don't like FB much anyway), they can be low barrier entry points, in particular since so many Canadians (who may otherwise not be very social-web enabled) are in FB.

The kind of canonical Canadian example is the Fair Copyright for Canada group, with its (at time of posting) 90,071 members.  It was brought up in the House of Commons.  It did translate into some offline activism.  And the sheer numbers did, I think, get both attention and generate concern for the party proposing the bill.  There are still lots of issues with that number.  Lots of people around the world care about copyright.  For all I know, that's 81,000 copyright-concerned Americans, and 9000 Canadians.  Such is the global web.

I do think "feel-good clicks" are a bit dangerous, they give you the perception of action without actually doing anything.  I've long been concerned by this kinda of almost mystical power ascribed to online organising.  In my review of Al Gore's The Assault on Reason, I said

Don't get me wrong, I think the Internet has a role to play in reasoned discourse. A small role. A useful tool for pointing attention to falsehoods and referencing inconvenient truths. But electronic communications have a fatal allure of virtual action.

Concerned about the environment? No need to go outside and walk in the woods, or clean up a polluted lot in your neighbourhood, or knock on your representative's door and explain the urgency of your position.

No, instead you can just fire off an email, write a blog posting, and then turn up the air conditioning and the lights and stretch out on the couch and read a good book.

That being said, I have myself translating the online into real world action on a number of occasions.  As I wrote in the StimulusWatch blog, it was an online posting that led me to an event that started a chain leading to the creation of the project.

That same event, and online chatter about a local conference, also led me (as partially outlined in my posting Making government data visible - and is Change coming to Ottawa?) to ChangeCamp Ottawa, a very real event happening at City Hall on May 16, which I have been helping to organise, an event which of course has a substantial online presence including a social network for the specific event, as well as being part of the larger ChangeCamp group on Facebook.

Similarly, a local news article in a free neighbourhood paper (yes, in print, with ink and everything) about a small garden/park space led me to a Facebook group which led me to an offline meeting which led me to create http://www.savethegarden.ca/

And of course, on a much more spectacular scale, the Obama campaign used (and continues to use) online organising as a tool, but they were very clear that the purpose of online was to drive a very extensive (and successful) ground game, people talking and knocking on doors, calling on phones, out in the real world.

So I think when it works best, the online world leads you offline, and offline leads you back online.  It's an ongoing discussion that flows across place and time.

Discussions enable meetings, data enables websites, websites enable more meetings, meetings come to consensus on APIs, APIs enable mashups... round and round it goes.

April 29, 2009

Google visualizes open government data

Official Google Blog - Adding Search Power to Public Data

We just launched a new search feature that makes it easy to find and compare public data. So for example, when comparing Santa Clara county data to the national unemployment rate, it becomes clear not only that Santa Clara's peak during 2002-2003 was really dramatic, but also that the recent increase is a bit more drastic than the national rate

There is also a video about how it works.

via ResourceShelf

So Google, whom do I talk to about adding Canadian data?

April 15, 2009

Talis hosting linked data in their Connected Commons

I have talked about Amazon's initiative in this area, but in looking back I see it was a bit buried in a larger discussion about library roles

Data is very different from journal articles - it lacks a standard format, and the resources it can consume - into the petabytes make it a daunting task for any organisation or set of organisations to take on.  I really admire the practical work that Amazon is doing with Public Datasets (thanks I suspect in large part to the vision of Deepak Singh).  The most practical things we can do right now is share what data we have, think about what open data will mean, and try to get more and more data openly shared.

This brings me to an email discussion I had with Leigh Dodds about Talis Connected Commons - thanks are due to Leigh for reminding me to look at the Talis approach.

Talis Connected Commons itself is about fostering the Linked Data community, by providing a rich hosting service:

For qualifying data sets, Talis will provide, through the Talis Platform:

  • Free hosting of up to 50 million RDF triples and 10Gb of content
  • Access to data access services that operate on that data, including data retrieval and text search
  • Free access to a public SPARQL endpoint for each dataset.

I asked Leigh how this fits with the Talis Project Xiphos initiative, and he explained that Xiphos is a more focussed initiative around "data in the education, library and publishing sectors", whereas Connected Commons is about any kind of data.

Talis, like Amazon, understands that a modern business is about fostering an ecosystem, a combination of shared data and services that can be used as a platform for software development and business development.

March 15, 2009

illuminate the government's operations with open data - Sunshine Week

The time has changed, the times have changed, the days are getting brighter - what better time for Sunshine Week.

Sunshine Week is a [United States] national initiative to open a dialogue about the importance of open government and freedom of information. Participants include print, broadcast and online news media, civic groups, libraries, non-profits, schools and others interested in the public's right to know.

March 15-21, 2009

They have a blog, Facebook, Twitter and all that groovy stuff.  I couldn't find a declared tag, so I'm just using the rather long "sunshineweek".

Through their Twitter I find an interesting article from The Economist - Track my tax dollars.

THE taxpayers do their part, and faithfully fling their hard-earned treasure into the gaping public maw. Surely they should be allowed to know what happens to it. So why not put government spending online?

This also gives me a chance to link to a great TED video of Sir Tim Berners-Lee (inventor of the WorldWideWeb) talking about "linked data".  He talks about the importance of sharing data because you can build so much on raw data.  The data part starts at about 3:58 in, and the government data specific part starts about 9:46 in.



I also very much liked his simple message of "Raw Data Now!" - my experience with Linked Data, at least at Open Repositories 2008, was that it very rapidly descends into obscure discussions about the philosophy of data representations (e.g. "do we point at the data, or the metadata about the data?"). The most important thing is to just get the raw data up, and then we can work to have wonderful semantic markup things once we actually have something to work with.

Tim B-L video via ReadWriteWeb via TED

March 14, 2009

ICSTI 2009 - Managing Data for Science

The ICSTI 2009 conference has a great lineup of speakers on its programme.  (I can't claim any responsibility for this, since while my organisation has helped with planning the conference, my small personal contribution has just been a few suggestions about the web presence.)

Many of the names you may recognize from enthusiastic blog postings of mine, so as you can imagine, I'm looking forward to going.  Speakers mentioned in this blog (with a link to the relevant posting) include

The event will be June 9-10, 2009 at Library and Archives Canada in Ottawa.  Early registration ends March 31.  I think it will be a great opportunity to discuss science data, e-science, and the roles that our libraries can play.  I am using tag icsti2009.  I hope to see many of you there.


View Larger Map

revisiting potential research-support roles for the library

Three years ago, I wrote this list of potential research-support roles for a library in the digital environment:

  1. institutional repository for pre-prints and post-prints of the research organization's publications
  2. data repository for the research conducted at the organization
  3. providing advanced (data/publication/information/discovery/etc.) tools that integrate into the researcher's workflow

These are numbered for convenience, not importance.
What do I think, three years on.

Institutional Repositories

1. While institutional repositories are valuable, they currently benefit primarily organisations, not researchers.  They provide a unified view of an organisation's published output.  For individual researchers, their priority may be just on getting published, or if they do want to disseminate their work, they may just post it to their own website (and sad to say, may get more Google rank having it there than in their repository).

Because of this property, there is still a huge content recruitment challenge for IRs.  I saw this at SPARC Digital Repositories 2008 where, to be blunt, the tone seemed to be mostly "we built it and they didn't come".  And in fairness to individual organisations, even Wellcome with its billions and its mandatory policy isn't getting good compliance:

The Wellcome Trust have been monitoring compliance rates, and have been disappointed to find that these are currently very low. As a result of this, they intend to more actively monitor compliance, and in future will be contacting researchers who have not had articles published as Open Access papers.

Wellcome gets tough on Open Access depositions - Peter Murray-Rust's blog - March 7, 2009

Even if you just look at the language we use - "recruitment", "compliance" - it's clear that IRs have become about coercion, which should be making us seriously question their value.  The good news is that there is a lot of good thinking about this - for example Les Carr suggests the idea of making the repository a file system for researchers, and many have suggested making repositories more web-friendly (or eliminating this special container we call IRs altogether, and just using regular web tools).

If providing an institutional repository is your primary or core value to the organisation, you are putting yourself at tremendous risk, because a savvy administrator may notice that you can purchase hosted repository services from BePress and BMC Open Repository.  Any time a primary function (however valuable) has become commodity, you are at risk.

Data

2. Data is a strange thing.  Unlike the publisher resistance to article repositories, there is pretty much universal agreement amongst all parties that data should be openly shared.  There are many reasons it is mostly not being shared.  Data can have very complicated licensing.  By its very nature, it is complex to manage and interpret.  And researchers who are, to be blunt, somewhat indifferent to sharing their papers, may actively resist sharing their data as they may feel it is the foundation of their future research.  There's lots of good work being done - just today Peter Murray-Rust points to some practical developments in Open Data in Science - and John Wilbanks and his team have been doing deep and valuable work on data licensing as part of Science Commons (see e.g. Databases and Creative Commons), but we are a long ways away from massive, agreed-upon sharing and preservation of data.  Also a risky area in which to bet your organisation, but a good area to be doing small, practical experiments in data sharing and preservation with willing researchers.  Canada unfortunately lacks an equivalent of the UK's national Digital Curation Centre to help make this happen here.  There is an effort to gather information as part of Research Data Canada, but I don't know how widely known it is.

This is an activity that will have great value, once all the hugely complicated issues begin to be resolved.  Data is very different from journal articles - it lacks a standard format, and the resources it can consume - into the petabytes make it a daunting task for any organisation or set of organisations to take on.  I really admire the practical work that Amazon is doing with Public Datasets (thanks I suspect in large part to the vision of Deepak Singh).  The most practical things we can do right now is share what data we have, think about what open data will mean, and try to get more and more data openly shared.

Advanced Discovery and e-Science

3. This is an important area that I think offers enormous potential for libraries.  In Canada it is also hugely challenging because we have no national equivalent of the US NSF Cyberinfrastructure or the UK National e-Science Centre.  The best we can do is kind of grassroots e-science, which is kind of a contradiction in terms, since the common understanding of e-science is that it is about tackling large scale problems with large scale infrastructure.

Where I think things are possible is on the smaller scale, building and integrating advanced discovery and integration with researcher workflows piece-by-piece.  (This shouldn't be read as "build all" - integrating includes e.g. helping researchers integrate Connotea, Zotero, etc. into their workflows.)  Many researchers are not that web-aware beyond Google searching - there are all kinds of tools that they could use.  The library has a role in providing information about those tools.  In the near term, there are some very quick wins just providing better discovery and information management tools, most of which are already available for free on the web.  In the medium term, there are intriguing possibilities to support researchers with Virtual Research Environments.  And in the long term, true semantic discovery may be possible, with very advanced computational and visualisation tools supporting very sophisticated computer- and data-driven science.

Many pieces of this environment are being built.  The library has a key role in integrating them and educating researchers about them.  As indicated above, this is everything from

basic citation management - Connotea, Zotero and many others
to
Virtual Research Environments as being investigated by JISC and the British Library (PDF)
to
text mining on full-text, as planned by UKPMC
to
semantic discovery as is being pioneered by EMBL, Biogen Idec library, and many others in many fields (too many to list, but just in biomed see e.g. Semantic Mining in Biomedicine Symposiums and "Pharmas Nudge Semantic Web Technology Toward Practical Drug Discovery Applications")

As you can see this is an exciting space with many activities going on.  The (research) libraries that can have a meaningful presence in this space (which currently has some daunting technical and infrastructure requirements at the high end) will, I believe, be able to sustain themselves by providing truly relevant and valued services to their researchers.

An important point must be made here: if you don't have some point of connection with your researchers - some discovery tools on your site and in their browser that the library provides, then you have no point of contact or credibility upon which to base all the advanced capabilities you may want to bring to bear.

UPDATE: I wanted to add some closing thoughts about the focus of this post.  I'm a technology planner (that's a large part of the meaning of the rather grander "enterprise technology architect" job description I have).  That means my main focus is on the technologies the organisation uses.  Not the specific implementations (DSpace vs. Fedora) but the general classes of technology-enabled business functions in the organisation that are provided.  So what I'm working through above is what kinds of approaches will be sustainable technology differentiators.  That is, where can your library add technology-supported value that will be recognised by researchers.  This has some implications for the people roles, the jobs the librarians would do, but I'm not examining that aspect.  ENDUPDATE

Some of the topics about data and e-science that I have discussed above will be covered in the ICSTI 2009 conference in Ottawa this June (about which more in the following posting).

March 05, 2009

data.gov is coming - Vivek Kundra named US Federal CIO

Vivek Kundra on data.gov and the Imperative to Distribute Data

VK: One of the things we want to do is embark on launching data.gov which would democratize data and give data access to the public and based on that challenge whether it is citizens, NGOs the private sector to help us think through how we address some of the toughest problems in the public sector.

VK: Data.gov will publish data feeds, so we'll have a vast array of data, and the way I like to think about this is that if you think of two forms of data that have been published in the federal government that have fundamentally transformed the economy. One example is the National Institute of Health working with other world bodies when they published the Human Genome Project data online. What that did is it created an entire revolution in personalized medicine where you ended up having over 500 drugs that were created and that are in the pipeline coming into the FDA.

VK: Second, is what happened in the geospatial community when the defense department decided to release data around satellites you created this GPS revolution where now you could go to your local car rental company and get a GPS device or your iPhone and get directions.

Vivek Kundra: Federal CIO in His Own Words - O'Reilly Radar - March 5, 2009

So... who is Canada's Vivek Kundra?

UPDATE 2009-05-22: Data.gov has launched.

February 11, 2009

Making government data visible - and is Change coming to Ottawa?

I went to the event about open government, it was good.  (It was a "meetup" I guess, in the popular terminology, or a new term I learned, 5à7).

Jennifer Bell of VisibleGovernment.ca did a presentation that I liked a lot, about mobilising government data by opening it up for public engagement through APIs.

Her presentation should show up at

http://www.slideshare.net/jenniferbell

In the meantime the presentations currently there give an idea of the concepts and objectives.

UPDATE 2009-02-12: Her presentation Benefits of Open Government Data is now available, and I also found an article by her that covers some of the same ground

Bell, J. 2009 Feb 1. Government Transparency via Open Data and Open Source. Open Source Business Resource [Online] 0:0. Available: http://www.osbr.ca/ojs/index.php/osbr/article/view/829/802

ENDUPDATE

Given my experience trying to sell the idea of APIs and open data to a community that mostly understands plain web pages, this is a challenging concept to promote, but I think there are some good US and UK examples that we can point to (about which more later).

Having learned from the experience of twitter-spamming people by live-tweeting an event, I made a FriendFeed room for my live notes instead.  I don't know how well it turned out - because of screensize limitations on my 7" netbook, it was much easier if I kept creating new top-level items, rather than just making a long section of comments - I'm not sure if this mix is right - maybe they all should have been toplevel items, or all comments.  Anyway the room is at

http://friendfeed.com/rooms/open-government-canada

in the spirit of the event, it's currently completely open.  You're welcome to add any relevant items, they don't have to be specific to this event.

Change Coming to Ottawa - Part 0

Jennifer also mentioned the possibility of a ChangeCamp Ottawa (for background, see my previous posting about ChangeCamp Toronto).  It looks like a good way to connect is through the ChangeCamp Twitter account,

http://twitter.com/changecamp

and the rather long hashtag #changecampottawa

Change Coming to Ottawa - Part 1

New web technologies face many adoption challenges for Canadian Federal Government - official languages rules, Common Look and Feel rules, slow tech adoption and a risk-averse culture.  That being said, there is a lot of excitement about the potential of these tools both internally, in an Enterprise 2.0 sense, and externally, in a Government 2.0 sense.  As I said in my Web 2.0 presentation at CISTI, in some ways I find it hard to get particularly excited when this is stretched to include blogs and wikis, which are actually both pre-Web 2.0 technologies (as you can see from the timeline I posted on FriendFeed) that are now quite mature.

That being said, the "Standard Set" of modern web tools (Twitter, Flickr, YouTube, Facebook) is getting deployed (perhaps somewhat quietly) on Canadian Government websites, the example I have been using is the Prime Minister's Official site, under Family Centre - Social Networks.

Part of what prompts this section of my posting is that there was a recent "Social Media for Government" event in Ottawa (unfortunately rather expensive).  Via the Twitter tag #ali I see that the Canadian Afghanistan mission is also using the social media tools: Flickr, YouTube, Facebook (and RSS feeds)

http://www.afghanistan.gc.ca/canada-afghanistan/multimedia/index.aspx?lang=en

Change Coming to Ottawa - Part 2

In addition to the internal drivers of change, the entire government information environment is changing worldwide.  Whether it's phenomenal reports like the Power of Information Taskforce in the UK, or President Obama releasing principles of openness on WhiteHouse.gov in his Memorandum on Transparency and Open Government

  • Government should be transparent
  • Government should be participatory
  • Government should be collaborative

or declaring in his Memorandum on the Freedom of Information Act that

The presumption of disclosure also means that agencies should take affirmative steps to make information public. They should not wait for specific requests from the public. All agencies should use modern technology to inform citizens about what is known and done by their Government. Disclosure should be timely.

Of course, words alone won't achieve these goals, and people like Don Tapscott have been talking about transparency for years (see e.g. "turn your organization inside-out" and Tapscott keynoting about transparency at SLA 2005).  What is changing is that more people with IT expertise and a passion for engaging the public are gaining prominence.  One great example is Vivek Kundra, who is rumoured (but AFAIK not yet currently announced) to be the Obama administration's choice for Office of Management and Budget administrator for e-government and information technology (an obscure title but one that is as powerful if not more than the government CTO).  Even if Kundra just stays working for the DC city government, he's already done some great things:

47 Applications in 30 Days for $50K

Like many cities, Washington, D.C., collects a vast amount of data and metrics about its people and operations including both realtime (or near realtime) data as well as relatively static information such as neighborhood demographics. Much of that information is now available to the public through more than 200 data feeds accessed through the Office of the Chief Technology Officer's Web site.

Kundra's idea was to use a competition to encourage developers to use those feeds to create applications for the public good.

You can see the competition site at

http://www.appsfordemocracy.org/

I think these kinds of open government initiatives are pathfinders that show us ways in which governments at all levels - federal, provincial, municipal - can open up and engage citizens.

February 05, 2009

UK Power of Information Taskforce Report (beta)

This week, the [UK] government's Power of Information taskforce set out a list of 25 urgent actions for the public sector machine - from Downing Street to local councils and NHS organisations - to take to embrace social networking, blogging and other such phenomena.

Top of the list is a relaxation about civil servants accessing - gasp - social media at work. ... "Public sector workers cannot be expected to be up to date with the power of information to transform public services if they cannot access the internet at work," the report says.

Equipped with this access, public servants should as a matter of course engage with online peer support forums concerned with their areas of work. (It notes that some sites "clearly would not welcome such intervention".) Civil servants should also "innovate and co-create with citizens online".

Are government ministers allowed to use social media? - The Guardian - February 5, 2009

via LibraryStuff

The report itself (in beta, open for comments for two weeks starting February 1, 2009) is available at http://poit.cabinetoffice.gov.uk/poit/

There are also some short videos on YouTube, as well as a blog that gives some information about the process of putting together the report, as well as links to their delicious bookmarks and other information.

The recommendations of the report in my reading overall are about using a combination of technology, open data (that is opening up government data to its owners, the public), and direct engagement with the public in order to more rapidly and more deeply engage citizens.

It would seem to me that libraries and librarians, as traditional points of interaction between the public and information, could play a useful role.  I think advocating and supporting "opener access" is an important library role (one sometimes compromised by library acceptance of DRM or restrictive licensing terms).

February 04, 2009

open government in Ottawa - Feb 11, 2009

Open source, open data, open government. Everyone is talking about it, and now you can hear how Governments around the world are gaining citizens' trust through transparency, and just plain making citizens' lives better by putting information online.

Presentation Summary & Bio

... We'll discuss the benefits of publishing government information in open, structured formats, and look at examples from around the world of these concepts in action.

Jennifer Bell is Executive Director of VisibleGovernment.ca, a non-profit that promotes online tools for government transparency.

Date: 11 February 2009
Time: 17:00 - 19:00
Location: Fresco Bistro Italiano
Street: 354 Elgin
City: Ottawa, ON

from barcampottawagov group

A signup page is available on Facebook - VisibleGovernment.ca @ Fresco's on Elgin

December 16, 2008

too much information? our unprecendented ability to gather data about everything

In Boing Boing I found this interesting article

Over at Kevin Kelly and Gary Wolf's Quantified Self blog ("Tools for knowing your own mind and body") guest blogger Alexandra Carmichael explains how she keeps a record of 40 different things in her life every day, and what she's learned about herself from studying the data.

Daily tracking of 40 things about yourself - Boing Boing - December 15, 2008

Following the comments (I think) led me to a Wall Street Journal article, The New Examined Life (December 8, 2008)

In the first week of January, New York graphic designer Nicholas Felton will boil down everything he did in 2008 into charts, graphs, maps and lists. The 2007 edition of his yearly retrospective notes that he received 13 postcards, lost six games of pool and read 4,736 book pages. He tracked every New York street he walked and sorted the 632 beers he consumed by country of origin.

Apparently he got so much interest in his professionally-presented yearly results that

they have become so popular that he recently launched a Web site with his friend Ryan Case called Daytum, which helps fellow chroniclers track the details of their own experiences.

(Daytum is currently in request-an-account beta.)

This is just part of a much larger trend, I see it with my friends who mountain bike, they use their cycling computers to gather incredible amounts of data which they then chart in various ways and use as part of their training plans to quite literally do data-based tuning of their own bodies.  Another example of this is the NikePlus site that supports the iPod+Nike running data system.

"Citizen data" also has huge implications for everything from science, through to mapping (for example, the Open Street Map project), and beyond.

I wonder what it may mean for medicine, science, sports, and other fields if we eventually have hundreds of millions of people gathering and sharing detailed information about themselves and their environment.

September 26, 2008

library support for open science

A nice overview of the challenges and opportunities for academic libraries as we plan to support 2020 science.

Some calls out to the Microsoft Towards 2020 Science report as well as various open science and science data initiatives.

From a presentation to the British Library Board on an Awayday (a charming term) by Carole Goble.
If the embedded presentation isn't working, you can also try it directly on Slideshare, or download the PowerPoint.

via FriendFeed

Previously:
March 23, 2006  various ideas about the future of science and computing

March 20, 2008

Open Repositories 2008

Through an unexpected series of events I find myself going to Open Repositories 2008

http://or08.ecs.soton.ac.uk/

The lineup looks great including a keynote from Peter Murray-Rust, and two (!) sessions on Scientific Repositories.

There is also a Repository Challenge for developers with a £2,500 prize, which is like a million US dollars now (finally, Canadians get to make US dollar jokes).  Kudos to David Flanders for leading this "let's just build stuff and see what works" approach.

I will be blogging under tag/category or08, and twittering under hashtag #or08

I made an Upcoming event, mainly because then if you add the machine tag

upcoming:event=455039

to your Flickr photos, it will automatically put in a nice "Taken at Open Repositories 2008" logo.

February 11, 2008

The Agenda on Microhoo and the Compute Cloud

The Agenda had their somewhat-usual technology suspects on talking about the Microsoft-Yahoo merger, with a majority of the show devoted to the idea of the "compute cloud" future for computing.  It's quite impressive that they took this fairly technical topic on, and they did a good job of covering it from various angles.

The Debate: The Coming Cloud (switch to the Mark Evans tab for the other discussion) - video is linked from these pages, just click on "Watch video" there

also available as iTunes audio and video - I'm npt seeing it in iTunes yet though

Overall I liked the show, but I would have liked to have seen a cloud computing user, rather than just a panel of pundits.  Show me someone who has moved their enterprise over to Amazon EC2/S3 or other cloud services.  (For example, Internet Archive has been experimenting with this... and I see I'm the top hit for this information: "Science Library Pad: Internet Archive 20th Century Search".  Also SmugMug photos uses Amazon S3 storage.)

I think the future splits into multiple models of computer use.  Gamers, for near-term, need local graphics engines and local storage (holding the multi-gigabyte virtual environments they use).  The intensive computer users like me probably still have their whole elaborate local network and local storage and local computing... well, basically entire personal data centre.  We're probably the only ones left with a lot of non-cloud data and computing.

The digital dividers (old people, poor people, the technically unsavvy) will have very simple devices, something very akin to thin clients - probably in many different form factors - built in to televisions, set-top boxes, things like OLPCs and Eee PCs, "intelligent LCD displays".  The highly mobile will have quite sophisticated but completely mobile devices.  All of the data for both groups lives in the cloud.

This being said, there is a very, very long history predicting the demise of the PC and its replacement with set-tops and thin clients, and it has yet to materialize.  People use a bunch of devices (cell, camera, PDA, laptop) AND their home computers, not instead of their computers.

SIDEBAR: Jesse Hirsh had quite the slag on for the Preventers of Information Services in IT Departments.
First he says home users can't be trusted with personal computers, and then he says work users must be trusted with unlimited use of Internet applications.

It is true that some of the Dr. No aspect of IT is arbitrary, but some of it is either out of their control (layers of regulations imposed from on high), and some of it is related to user support.  IT is about user productivity.  Computer secure, applications running smoothly = happy IT.  If this could be guaranteed through the magic of trusted cloud computing, that would be fine.  But the reality is, users download a bunch of cr*p and access a bunch of cr*p websites, and then IT has to come in and try to clean it up.  That's why IT tries to lockdown.  Lockdown is about being able to guarantee a stable computer, network, and sustainable support experience.

If you want to see what happens in an uncontrolled environment, just let a bunch of consultants into your organisation and let them "manage themselves" and see how well that works...

LimeWire led to data breach: N.L. justice minister

an outside consultant had installed LimeWire, a popular program used to swap music for free, on a laptop computer that was being used to work with data for the Workplace Health, Safety and Compensation Commission.

As a result, information — including names, addresses, dates of birth and medical and work histories — related to 153 individuals was exposed

END SIDEBAR

SIDEBAR 2: A minor quibble with terminology used during the show, Amazon's S3 is cloud storage only, their compute cloud service is EC2.  END SIDEBAR

February 08, 2008

CNI Fall 2007 presentations, podcast

Lots of interesting material from fall CNI.

An audio interview with Birte Christensen-Dalsgaard, Director of Development at the State and University Library in Aarhus, Denmark about the Summa search system and other academic library topics.

Current Experiments & Future Directions in Scholarly Communication - Timo Hannay, Nature

The eCrystals Federation: Open Data Repositories Supporting Open Science
Liz Lyon, University of Bath
Simon Coles, University of Southampton
Manjula Patel, University of Bath

Summa podcast link via DigitalKoans

Previously:
October 26, 2006  the future of the scientific paper and more on open web science (Timo Hannay)
March 31, 2006  presentations on e-Science and e-Biz workflow, and research data preservation (Dr Liz Lyon)
October 11, 2005  ILI2005 - Tuesday 11th - Living with Google: New roles for libraries (Birte Christensen-Dalsgaard)
September 27, 2005  Info Grid 2005 - Tuesday 27th, 09:00 - Developing e-infrastructure to support new research and learning paradigms (Dr Liz Lyon)

February 06, 2008

getting HEP to scholarly infrastructure

As an appropriate follow-on to my previous post thinking about domain-specific sites on the net, Rolf-Dieter Heuer, CERN DG-elect, shows us what a High-Energy Physics (HEP) e-Infrastructure for Scientific Communication may look like:

1. Build a complete HEP information platform
2. Enable text- and data-mining applications
3. Demonstrate and deploy Web2.0 applications
4. Preservation and re-use of research data

www.scoap3.org/files/APE2008-Heuer.pdf

As you might expect from the URL, there is also some discussion of the SCOAP3 initiative.

Although there are of course aspects that are unique to the HEP community, there are also lots of ideas that are generally applicable to domain portals for other areas of science.

Previously:
June 11, 2007  IATUL 2007 - June 11 - Dr. Rüdiger Voss - Open Access - SCOAP3
June 11, 2007  OA and repositories : beyond green and gold - Jens Vigen - June 11 - IATUL 2007

November 19, 2007

open science and the web for research library 2.0?

Peter Murray-Rust points me to Dr. Liz Lyon's keynote for a November 2007 ARL Directors meeting

Open Science and the Research Library: Roles, Challenges and Opportunities?

I saw her present at InfoGrid 2005 and I've downloaded subsequent ones, this one is more Web 2.0 centric than others I have seen.  She has done a lot of deep thinking about the challenges and opportunities related to dealing with scientific data in our new cyberscience world.

Along related lines Bernard Dumouchel (former CISTI DG) wrote a short comment which asserted that supporting open science was a key possible future role for the academic library, it was for the September 2006 ARL Task Force on Library Support for E-Science ARL/NSF Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe, the paper is

New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe - CISTI submission (PDF)

Unfortunately ARL moved all the files from this event, breaking all the previous links to presentations and reports that I had made.  Data stewardship irony, no?

This is the new site they made, it has working links

http://www.arl.org/pp/access/nsfworkshop.shtml

Previously:
March 31, 2006  presentations on e-Science and e-Biz workflow, and research data preservation
February 15, 2006  roles and challenges for the academic library in e-Science
September 27, 2005  Info Grid 2005 - Tuesday 27th, 09:00 - Developing e-infrastructure to support new research and learning paradigms

October 19, 2007

CLIR cyberinfrastructure short articles

The Council on Library and Information Resources (CLIR) has a publication which I guess is called CLIR Issues, some recent... issues it has covered in its... issues include:

here's a section from "As We May Rethink":

The new cyberinfrastructure calls into question many of the methods and procedures with which we have worked for the past two decades. We have become comfortable with the technology, and execute much of our work using familiar applications on indispensable machines. CI is, in essence, an environment that facilitates sharing of data on an unprecedented scale, which in turn implies a far greater degree of federation, aggregation, and interoperable capabilities than we have heretofore experienced. It demands new kinds of expertise, which will require new forms of training and mentoring to recognize and respond to changing research behaviors. While the transformational potential of CI on higher education is not difficult to intuit, the details of this transformation have yet to be defined, and remain ambiguous.

It is precisely this ambiguity that allows us to explore the multiple possibilities of developing a functional and robust cyberinfrastructure and to create this new environment in the most flexible and nuanced fashion possible. Succeeding in the evolving CI will require that we thoroughly rethink our procedures and expectations on the technical as well as the social levels, for the technical and social are deeply interrelated in cyberinfrastructure.

Consider, for example, the sheer enormity of data to be supported. Many of the vast data sets are relatively new—not only in the humanities, with its large full-text and video databases, but also in astronomy and particle physics. Challenges such as data mining, semantic searches, multimedia data stewardship, and interoperability are common to all disciplines. This suggests that forward-looking researchers and scholars will need to exchange ideas and CI requirements for their mutual benefit.

There is little precedent for this kind of interdisciplinary dialogue. Past practice, characterized by a focus on traditional disciplinary purviews, silos of funded projects, and poor communication among researchers across intellectual boundaries, is at odds with the conceptual underpinnings of CI. In this respect, the past should not be prologue: our traditional methods of doing business and conducting research, as well as our systems of professional advancement, may undermine our best intentions unless we recognize the limitations of the academic procedures that have brought us to a point of new awareness.

global research data library - GRL2020 and PV 2007 presentations online

Global Research Library 2020 - GRL 2020 - http://www.lib.washington.edu/grl2020/presentations.html

PV 2007 - Ensuring the Long-Term Preservation and Value Adding to Scientific and Technical Data - Proceedings

Some presentations of interest:

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004