April 1, 2008 - I had read the posting by Savas (probably via Lorcan), so it was great to have an opportunity to hear about Microsoft's thinking directly from them. The most dramatic announcement was that Microsoft Research will be developing entirely on the Linux platform.
UPDATE: Lee Dirks said I almost gave him a heartattack with my little April Fools' prank, and the day is wearing on, so it's time to update and move my text up from the bottom...
Thanks go to Lee Dirks and David Flanders for making my first full day
in Southampton a very interesting one. The Linux platform bit is was my
contribution to April Fools. MS Research Tech Computing are in fact of course entirely
dedicated to Microsoft platforms. ENDUPDATE
For further discussion of the MS Repository Platform efforts, they have created a group
http://community.research.microsoft.com/forums/90.aspx
I'm sure it has happened before, but it was the first time I had seen the leads/directors of Fedora (Sandy Payette), Dspace (Michele Kimpton) and Eprints (Les Carr) brought together.
There was a lot about SWORD and also some on OAI-ORE.
Notes on Microsoft Summit on Repository Interoperability event
Lee Dirks
External Research, Technical Computing
- Putting computing into science
- Putting science into computing
Science + computation are not the entire equation
* Microsoft must improve its offerings throughout the scholarly communication lifecycle
Approach: Conduct prototyping projects and proofs-of-concept to evolve Microsoft's scholarly
communication offerings
Five factors Microsoft considers key
* Interop is paramount
* Optimize for data-driven research & science
* Data preservation (and provenance) should be baseline
* Community protocols & conventions
* Social networking & semantic knowledge discovery
when possible IP shared at
http://www.codeplex.com/
Project Execution Models
* internal FTE
* external devel (vendor)
* external devel (institutional partner)
* mixed models
projects 1-2 years
Examples:
* GenePattern for Word 2008
- integrate data and images from GenePattern workflows into research papers
- will move into production in April/May 2008
* Math in Word 2007
* Chemistry Drawing for Office 15
- Peter Murray-Rust et al.
- Chemistry Markup Language (CML)
- proof-of-concept plugin ... but two versions of Office from now, Chemistry will be built-in (we hope)
* PLANETS
- EU project
- preservation of Office documents based on Office OpenXML (OOXML)
===
Savas
"Supporting researchers worldwide"
working towards an "eResearch Platform", a grouping of Microsoft tools that can support research
Flow: Author->Publish->Archive->Discover
Author
* Semantic Annotations for Word
(current target: protein databank)
* NLM DTD plug in - will support SWORD
- export a Word document in NLM DTD -> .nlmx
* Research Ribbon concept - tools relevant to researchers in Office
* can search arXiv from within Word using OpenSearch
Publish
* Conference Management Tool (also SWORD endpoint)
* eJournal - manage peer review (also SWORD endpoint)
Archive
* Research Output Repository (also SWORD endpoint and will support OAI-ORE)
* arXiv (also SWORD support)
? Repository interop/federation
Q: Shibboleth / OpenID support?
A: haven't started looking at it yet
===
Santosh
Microsoft's Research Output Repository Platform
Platform for storing scholarly works and metadata
- papers, videos, presentations, lectures, references...
- enables the development of new funcionality and services on top of the platform
- relationships between stored entitities
* SQL Server 2005 or 2008, Entity Framework, .NET 3.5
* the repository software (but not the servers) will be available to the community for free
Platform Overview
- variety of resource types (publications, tech reports etc.)
- resource tagging
- relationship between resources (triple-based)
- set of well-known predicates (IsVersionOf, Contains, etc.)
- new resource types and predicates through extensibility
Platform
* Core API
* Framework API
* OAI-PMH, Syndication, BibTeX, Search
- UI Web Controls
"A semantic computing platform"
- hybrid between relational database and a triple store
community.research.microsoft.com/forums/90.aspx
===
Stewart Lewis
Update on SWORD Protocol & Future Directions
http://www.ukoln.ac.uk/repositories/digirep/index/SWORD
- Simple Web Service Offering Repository Deposit
JISC/CETIS end of 2005
- identified lack of standard deposit API as #1 issue
2006: Creation of Repository Deposit working group
November 2006
- JISC call for funding, bid submitted for SWORD
- Julie Alinson
- lightweight and agile project
Workpackage 1: Evaluate existing standards
- WebDAV
- JSR
- OKI OSID
- ECL
- SRW Update
- SPI Google Data API
- ATOM Publishing Protocol (APP)
-> page on wiki examining them all
Workpackage 2: Tech Dev
- DSpace
- Fedora
- Eprints
- intraLibrary
* Java client library
- command line, desktop app, web interface
Workpackage 3: User testing and feedback
- arXiv
- SOURCE
- SPECTRa
- White Rose Research Online
- FeedForward
How does SWORD work?
* Two stages
- Discover
GET a Service Document
- Deposit
POST an item to the URI of the collection
GET
- X-On-Behalf-Of
- get a URI
POST
SWORD extensions to APP
* SWORD level
- 0
- basic
- 1
- full implementation
- X-On-Behalf-Of
- X-Verbose
- X-No-Op
- X-Format-Namespace
Discovery SWORD interfaces
* Recommend /sword-app
* Recommend /sword-app/servicedocument
* Recommend <link rel="sword" href="/sword-app/servicedocument" />
Authentication
- Required: HTTP BASIC
What?
- any package supported by the repository
- DSpace/Eprints: ZIP files with a METS manifest in SWAP format, with files
- Fedora: image files / METS documents (pull in referenced data streams)
- OAI-ORE resource maps
SWORD 2
- follow-on project
? more APP
? UPDATE / DELETE
? more clients
? client libraries
? provide support to users
Q: What is relationship with APP?
A: none
Comment: Sandy - We need a basic protocol that supports read and write.
Comment: Michele - We need to get into workflow - Zotero, EndNote etc.
Q: OAI-ORE and SWORD together?
===
Experience implementing SWORD at arXiv.org
Simeon Warner
Thorsten Schwander
1. Background
2. SWORD implementation choices
3. Ideas for SWORD evolution
automating from Microsoft Conference Toolkit
CS unusual in that conference publications very important
- use arXiv to host open access proceedings
work internally at arXiv to present conference proceedings as a whole
http://arxiv.org/help/api
Authority
1. author
2. the conference organizer
3. the CMT system (will use the organizer's authority)
returning errors
- all additional errors returned HTTP 400 Bad Request
- return an Atom document for each error code
3. Ideas for SWORD evolution
* Primary goal should be to reduce pairwise customization
- improved self description
- self-describe size limits for uploads
- improved error reporting
sword:errorcode with namespace (and with description)
Integration with complex workflows
- asynchronous notification
===
DSpace
Michele Kimpton
Interop
* Business
- need defined business case / use case need because there is a small developer community
community will rally around common protocols
* operational
- policy transfer-control
- embargo, authentication, dark archive...
- metadata loss
- identifier compatibility and acceptance
* technical
- numerous content packages
- representation incompatibilities
- interpretation of standards
Community Efforts
* OAI-PMH, OAI-ORE, SWORD, METS, IMS, SWAP
* federation acorss DSpace repositories
* working with key apps
* integration with "content creation" tools to ensure materials are deposited
===
issues: strong standardization of library *DATA*
weak standardization of repository data
===
Les Carr
Eprints
drawing funny diagrams
user level interop
===
Sandy Payette
Fedora Commons and Interop
2007 Content Model Architecture (CMA)
- Registry of "content model" types for digital objects
Now: Simplicity
2008: Atom Syndication Format, OAI-ORE, simple common web APIs with wide appeal
and adopt other standads where possible
high-end interop (web services apis)
backend interop (Akubra) - various underlying storage - transactional stores, Sun HoneyComb,
Internet Archive PetaBox
* Topaz - application level objects and semantic interoperability
ligh-weight ways to let apps define object types
info objects mapped into triples and persisted in Mulgara triplestore
* Fedora Middleware Projects
- Simple JMS layer with e.g. Gsearch, OAI, Ingest on top
What do users really want interoperability to achieve?
Q (me): heavyweight APIs vs lightweight?
A: light for integration with web apps, heavy inside enterprise
===
Issues
- federation & interop
- support for delete, update
- document formats
- content creation opportunities
- content flow -> ingest
discussion of harvesting for search, Google Scholar
how are people providing federated search
- OAI-PMH
- one-off federated integration
Andy said something like "there's fundamental tension between simple and complex".
You can find Andy's liveblogging of the event through his Twitter stream
http://twitter.com/andypowe11
Recent Comments