« microformats links | Main | EA/SOA are about business process, not technology »

April 01, 2008

Microsoft Summit on Repository Interop - notes

April 1, 2008 - I had read the posting by Savas (probably via Lorcan), so it was great to have an opportunity to hear about Microsoft's thinking directly from them.  The most dramatic announcement was that Microsoft Research will be developing entirely on the Linux platform.

UPDATE: Lee Dirks said I almost gave him a heartattack with my little April Fools' prank, and the day is wearing on, so it's time to update and move my text up from the bottom...

Thanks go to Lee Dirks and David Flanders for making my first full day in Southampton a very interesting one.  The Linux platform bit is was my contribution to April Fools.  MS Research Tech Computing are in fact of course entirely dedicated to Microsoft platforms.  ENDUPDATE

For further discussion of the MS Repository Platform efforts, they have created a group

http://community.research.microsoft.com/forums/90.aspx

I'm sure it has happened before, but it was the first time I had seen the leads/directors of Fedora (Sandy Payette), Dspace (Michele Kimpton) and Eprints (Les Carr) brought together.

There was a lot about SWORD and also some on OAI-ORE.

Notes on Microsoft Summit on Repository Interoperability event

Lee Dirks
External Research, Technical Computing
- Putting computing into science
- Putting science into computing

Science + computation are not the entire equation
* Microsoft must improve its offerings throughout the scholarly communication lifecycle

Approach: Conduct prototyping projects and proofs-of-concept to evolve Microsoft's scholarly
communication offerings

Five factors Microsoft considers key
* Interop is paramount
* Optimize for data-driven research & science
* Data preservation (and provenance) should be baseline
* Community protocols & conventions
* Social networking & semantic knowledge discovery

when possible IP shared at
http://www.codeplex.com/

Project Execution Models
* internal FTE
* external devel (vendor)
* external devel (institutional partner)
* mixed models

projects 1-2 years

Examples:
* GenePattern for Word 2008
- integrate data and images from GenePattern workflows into research papers
- will move into production in April/May 2008

* Math in Word 2007

* Chemistry Drawing for Office 15
- Peter Murray-Rust et al.
- Chemistry Markup Language (CML)
- proof-of-concept plugin ... but two versions of Office from now, Chemistry will be built-in (we hope)

* PLANETS
- EU project
- preservation of Office documents based on Office OpenXML (OOXML)

===

Savas
"Supporting researchers worldwide"

working towards an "eResearch Platform", a grouping of Microsoft tools that can support research

Flow: Author->Publish->Archive->Discover

Author
* Semantic Annotations for Word
(current target: protein databank)

* NLM DTD plug in - will support SWORD
- export a Word document in NLM DTD -> .nlmx

* Research Ribbon concept - tools relevant to researchers in Office

* can search arXiv from within Word using OpenSearch

Publish
* Conference Management Tool (also SWORD endpoint)
* eJournal - manage peer review (also SWORD endpoint)

Archive
* Research Output Repository (also SWORD endpoint and will support OAI-ORE)
* arXiv (also SWORD support)

? Repository interop/federation

Q: Shibboleth / OpenID support?
A: haven't started looking at it yet

===

Santosh
Microsoft's Research Output Repository Platform

Platform for storing scholarly works and metadata
- papers, videos, presentations, lectures, references...
- enables the development of new funcionality and services on top of the platform
- relationships between stored entitities

* SQL Server 2005 or 2008, Entity Framework, .NET 3.5

* the repository software (but not the servers) will be available to the community for free

Platform Overview
- variety of resource types (publications, tech reports etc.)
- resource tagging
- relationship between resources (triple-based)
- set of well-known predicates (IsVersionOf, Contains, etc.)
- new resource types and predicates through extensibility

Platform
* Core API
* Framework API
* OAI-PMH, Syndication, BibTeX, Search
- UI Web Controls

"A semantic computing platform"
- hybrid between relational database and a triple store

community.research.microsoft.com/forums/90.aspx

===

Stewart Lewis
Update on SWORD Protocol & Future Directions

http://www.ukoln.ac.uk/repositories/digirep/index/SWORD

- Simple Web Service Offering Repository Deposit

JISC/CETIS end of 2005
- identified lack of standard deposit API as #1 issue

2006: Creation of Repository Deposit working group

November 2006
- JISC call for funding, bid submitted for SWORD
- Julie Alinson
- lightweight and agile project

Workpackage 1: Evaluate existing standards
- WebDAV
- JSR
- OKI OSID
- ECL
- SRW Update
- SPI Google Data API
- ATOM Publishing Protocol (APP)

-> page on wiki examining them all

Workpackage 2: Tech Dev
- DSpace
- Fedora
- Eprints
- intraLibrary
* Java client library
- command line, desktop app, web interface

Workpackage 3: User testing and feedback
- arXiv
- SOURCE
- SPECTRa
- White Rose Research Online
- FeedForward

How does SWORD work?
* Two stages
- Discover
GET a Service Document
- Deposit
POST an item to the URI of the collection

GET
- X-On-Behalf-Of
- get a URI

POST

SWORD extensions to APP
* SWORD level
- 0
  - basic
- 1
  - full implementation

- X-On-Behalf-Of
- X-Verbose
- X-No-Op
- X-Format-Namespace

Discovery SWORD interfaces
* Recommend /sword-app
* Recommend /sword-app/servicedocument
* Recommend <link rel="sword" href="/sword-app/servicedocument" />

Authentication
- Required: HTTP BASIC

What?
- any package supported by the repository
- DSpace/Eprints: ZIP files with a METS manifest in SWAP format, with files
- Fedora: image files / METS documents (pull in referenced data streams)
- OAI-ORE resource maps

SWORD 2
- follow-on project
? more APP
? UPDATE / DELETE
? more clients
? client libraries
? provide support to users

Q: What is relationship with APP?
A: none

Comment: Sandy - We need a basic protocol that supports read and write.
Comment: Michele - We need to get into workflow - Zotero, EndNote etc.

Q: OAI-ORE and SWORD together?

===

Experience implementing SWORD at arXiv.org
Simeon Warner
Thorsten Schwander

1. Background
2. SWORD implementation choices
3. Ideas for SWORD evolution

automating from Microsoft Conference Toolkit

CS unusual in that conference publications very important
- use arXiv to host open access proceedings

work internally at arXiv to present conference proceedings as a whole

http://arxiv.org/help/api

Authority
1. author
2. the conference organizer
3. the CMT system (will use the organizer's authority)

returning errors
- all additional errors returned HTTP 400 Bad Request
- return an Atom document for each error code

3. Ideas for SWORD evolution

* Primary goal should be to reduce pairwise customization

- improved self description
  - self-describe size limits for uploads
  - improved error reporting
  sword:errorcode with namespace (and with description)

Integration with complex workflows
- asynchronous notification

===

DSpace
Michele Kimpton

Interop

* Business
- need defined business case / use case need because there is a small developer community

community will rally around common protocols

* operational
- policy transfer-control
  - embargo, authentication, dark archive...
- metadata loss
- identifier compatibility and acceptance

* technical
- numerous content packages
- representation incompatibilities
- interpretation of standards

Community Efforts

* OAI-PMH, OAI-ORE, SWORD, METS, IMS, SWAP
* federation acorss DSpace repositories
* working with key apps
* integration with "content creation" tools to ensure materials are deposited

===

issues: strong standardization of library *DATA*
        weak standardization of repository data

===

Les Carr
Eprints

drawing funny diagrams

user level interop

===

Sandy Payette
Fedora Commons and Interop

2007 Content Model Architecture (CMA)
- Registry of "content model" types for digital objects

Now: Simplicity

2008: Atom Syndication Format, OAI-ORE, simple common web APIs with wide appeal
and adopt other standads where possible

high-end interop (web services apis)
backend interop (Akubra) - various underlying storage - transactional stores, Sun HoneyComb,
Internet Archive PetaBox

* Topaz - application level objects and semantic interoperability

ligh-weight ways to let apps define object types

info objects mapped into triples and persisted in Mulgara triplestore

* Fedora Middleware Projects
- Simple JMS layer with e.g. Gsearch, OAI, Ingest on top

What do users really want interoperability to achieve?

Q (me): heavyweight APIs vs lightweight?
A: light for integration with web apps, heavy inside enterprise

===

Issues
- federation & interop
  - support for delete, update
  - document formats
- content creation opportunities
- content flow -> ingest

discussion of harvesting for search, Google Scholar

how are people providing federated search
- OAI-PMH
- one-off federated integration

Andy said something like "there's fundamental tension between simple and complex".
You can find Andy's liveblogging of the event through his Twitter stream

http://twitter.com/andypowe11

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c8a6453ef00e5518aea818833

Listed below are links to weblogs that reference Microsoft Summit on Repository Interop - notes:

Comments

The comments to this entry are closed.

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004