Posts categorized "Digital Library"

March 20, 2008

Open Repositories 2008

Through an unexpected series of events I find myself going to Open Repositories 2008

http://or08.ecs.soton.ac.uk/

The lineup looks great including a keynote from Peter Murray-Rust, and two (!) sessions on Scientific Repositories.

There is also a Repository Challenge for developers with a £2,500 prize, which is like a million US dollars now (finally, Canadians get to make US dollar jokes).  Kudos to David Flanders for leading this "let's just build stuff and see what works" approach.

I will be blogging under tag/category or08, and twittering under hashtag #or08

I made an Upcoming event, mainly because then if you add the machine tag

upcoming:event=455039

to your Flickr photos, it will automatically put in a nice "Taken at Open Repositories 2008" logo.

February 25, 2008

the paperless home?

The New York Times article Pushing Paper Out the Door has been circulating around at work and it actually did inspire me to think about better paper handling at home.

First off, a clarification, there is no way my books are going away.

But I would like to get rid of the paper output of the consumer cycle: various bills, credit card receipts etc.

My plan is: scan->organise and backup digital object->shred paper.

The lifehacker Going Paperless at Home? answers are not quite aligned with what I need.

A lot of what the lifehacker people suggest is about going to electronic billing and payment.  Canada has had online banking for a long time, so that's not an issue for me, I did that over a decade ago.

Inputs

  • junk paper mail - goes directly into recycling bin in mailroom anyway - can also reduce with http://www.reddotcampaign.ca/
  • bills/statements I still get in paper - although I pay all my bills online, I still get some statements mailed in paper, simply because I trust "information I hold myself in my house" more than "information offsite that could be changed at any time" in case I have to challenge a billing item
  • credit and debit card reciepts - Canada also went electronic point-of-sale payment a long time ago, but although I have no paper money, I have tons of paper receipts from every transaction.  It would be nice if instead of a paper receipt you could just plug in a memory key and have it "print to storage" - and yes, I reconcile my receipts with my statements each month, using an old version of Microsoft Money
  • miscellaneous incoming paper mail

There are some recommendations for the SnapScan scanner, but it's showing at over U$400 on Amazon.
I got a shiny multifunction flatbed instead, it's Mac and PC compatible.

For me, the main issue is organisation.  Most people on lifehacker seem to be scanning to PDF, and letting their desktop search handle the rest.

I'd basically like Picasa for scans, except with an extra layer of organisation.
I want to be able to organise scans by:

  • date scanned
  • date on document
  • types and categories of documents

That should be pretty easy, but I haven't found anything that quite does what I want yet.  Are there photo organisers that also handle PDF?  Or should I look for specific scan organising software?  Maybe I could use one of the tools that is for management of downloaded PDF articles, and use it generically for all PDFs, i.e. a reference manager for receipts?

Maybe something like Bookends?  Or one of the others from this Mac list of reference managers?
What about PC equivalents?

Or maybe I should run my own home digital library or institutional repository?  (I am only half-joking.)

We've put so much thought and technology into managing digital objects for libraries, surely there must be some of that I can reuse at home?

What are you using to manage scanned and born-digital paper at home?

(Born-digital paper?  What an age we live in.)

February 08, 2008

ECDL 2008 in Denmark

ECDL 2008 will be September 14-19, 2008 in Aarhus (Århus) Denmark.  Birte Christensen-Dalsgaard is General Chair.

Important dates:
Workshop submission deadline: 15 February 2008
Registration opens: 03 March 2008
Paper, Tutorial, Poster & Demo submission deadline: 14 March 2008

Topics of contributions include:

    * Concepts of Digital Libraries and digital content
    * Collection building, management and integration
    * System architectures, integration and interoperability
    * Information organisation, search and usage
    * Multilingual information access and multimedia content management
    * User interfaces for digital libraries
    * User studies and system evaluation
    * Digital archiving and preservation: methodological, technical and legal issues
    * Digital Library applications in e-science, e-learning, e-government, cultural heritage, etc.
    * Web 2.0 and associated technologies

Previously:
My notes from ECDL 2006

November 08, 2007

presentations from DLF Fall Forum 2007

Some of the presentations are posted on

http://www.diglib.org/forums/fall2007/2007fallprogram.htm

If you want to locate them, just search for [PRESENTATION]

October 15, 2007

IJDL special on e-Science and Digital Libraries

The International Journal on Digital Libraries (IJDL), Vol. 7, No. 1-2, October 2007, pp. 1-122, ISSN 1432-5012 (Print) 1432-1300 (Online) is a special feature on e-Science and Digital Libraries.

Within the issue is a set of ten articles (six long and four short) representing a range of perspectives on eScience, and the use of digital libraries to organize science collections, that will be of interest to both the eScience and digital library communities. The articles highlight the synergies and differences between the communities, and the challenges present in managing
massive collections.

...

the digital library community is concerned with the scholarly life cycle, an essential component of eScience practices that are driven by the nature of scientific scholarship. As such, there will be benefits from increased partnership between the two communities. A closer partnership between the two communities can be developed around three areas:

• Support for the range of the scholarly communication lifecycle
• The role of data within both communities
• Broader participation of the digital library community in eScience

Connecting digital libraries to eScience: the future of scientific scholarship
doi:10.1007/s00799-007-0030-9

There are many topics that you will have read about in my blog before, including

October 08, 2007

DLF Fall Forum 2007

The Digital Library Federation (DLF) Fall Forum 2007 will be November 5-7, 2007 in Philadelphia.  I just realised I haven't written anything about it.

I am scheduled to present "Service-Oriented Architecture for Libraries" on Tuesday 6th.

For blogging etc. I will be using tag/category DLFfall2007.

I'm interested to go because although I have blogged a number of times about the DLF, I haven't actually been to a DLF event before this.

Previously:
July 24, 2007  software development, staffing and new library technology
March 30, 2007  pondering SOA the UK and Aussie way
January 30, 2007  report on Library Future Roles webcast - in which I ask Peter Brantley a question
April 3, 2006  standards and discoverability for library Web Services - DSLR Workshop
September 6, 2005  library service frameworks: avoiding wheel reinvention

June 02, 2007

audio interview with Peter Brantley about library service collaboration

a 19 minute [audio] interview with Peter Brantley, new Executive Director of the Digital Library Federation

via EDUCAUSE Connect - gbayne's blog - An Interview with Peter Brantley at CNI's 2007 Spring Task Force Meeting

via ResourceShelf

The main topics as I heard them:

  • the challenge of getting libraries to collaborate deeply and meaningfully on technology
  • determining what kinds of technology services (network-level services) that libraries can usefully contribute to an already-rich discovery environment
  • how to put those kinds of the library collaborations together in a way that includes the ability to sustain and support the new technology services
  • the challenge of shifting staff to support new areas of activity

I wonder if I've been using the wrong terminology all this time.  I've been talking and talking about library Service-Oriented Architecture with (as far as I can tell), very little uptake.  And I will be talking about it more in an upcoming presentation and an article in Library Journal, as well one of my colleagues has an article upcoming in a different journal.  It is hard to sustain one's enthusiasm indefinitely however, in the absence of much response.

We already have framework upon framework, but I can't seem to get anyone to talk about how they might inter-relate.  Are there communication barriers?  Terminology barriers?  Resource barriers?  Culture barriers?

I wonder whether we shouldn't really be using language more like "library technology service collaboration" or "library network-level enhancement collaboration".  I think maybe when we say "digital library" or SOA, people say, "oh, that's not me".  But it is.  We're all digital libraries now.  What the whole collaborative library SOA idea is about is more libraries producing and consuming more network services, so that we can all better participate in the online experience.

Previously:
January 30, 2007  report on Library Future Roles webcast
September 26, 2005  Info Grid 2005 - Monday 26th - Deploying Services, Not Libraries

March 01, 2007

article on Federal Science eLibrary Pilot

Federal Science eLibrary Pilot: Seamless, equitable desktop access for Canadian government researchers (or try alternate DOI-based URL)

Author(s): Beverly Brown, Cynthia Found, Merle McConnell
[Beverly Brown and Cynthia Found work at CISTI.]
Journal: The Electronic Library
ISSN: 0264-0473
Year: 2007 Volume: 25 Issue: 1 Page: 8-17
DOI: 10.1108/02640470710729083

Conclusions and next steps

Participants who used the pilot eLibrary overwhelmingly indicated it had a positive or very positive impact on their work activities and productivity. Specific positive impacts included:

    * pilot users cited significantly reduced time spent finding and verifying information, allowing them to concentrate on critical activities such as manuscript preparation, peer review activities, professional reading and other research and laboratory activities;
    * pilot site librarians felt that offering coordinated access to more electronic journals through a Federal Science eLibrary would free their time for other professional activities, allowing them to serve their clients better and better meet their expectations for e-content; and
    * remote users found increased access to e-journals had a positive impact on their ability to stay current and find needed information while in the field.

The NRC-CISTI infostructure proved to be a reliable platform for the pilot project. This type of infostructure could be used to support the delivery of a Federal Science eLibrary service to federal government researchers anywhere in Canada.

February 19, 2007

German scientific library services

Because how often do I get to use "Deutsche Forschungsgemeinschaft" in a sentence?

DFG - Scientific Library Services and Information Systems: Funding Priorities Through 2015 (PDF)

9 pages, English.

I think Lorcan will like this:

Today's libraries, archives and other specialised information services operate largely independently of one another.  These different institutions must integrate into a coherent nationwide network for the provision of digital information for science and the humanities.  By creating a digital environment within universities and research institutes in which digital channels become the standard medium for accessing, analysing and publishing research data and scientific results, libraries can become the cornerstone of e-science.

Another part that jumped out at me was about their scientific information portal

The creation of Vascoda, the nucleus of a "German Digital Library," sponsored mutually by libraries and specialised information services, represented an important building block in the development of an integrated system of national information provision.

via In Between

(In case you're wondering

The Deutsche Forschungsgemeinschaft [DFG] (German Research Foundation) is the central, self-governing research funding organisation that promotes research at universities and other publicly financed research institutions in Germany.

)

February 03, 2007

eduSource - not dead yet?

I don't know what the state of learning object repositories is in Canada.
Via Ten Thousand Year Blog I find apparently the vestiges of the eduSource project

http://edusource.licef.teluq.uquebec.ca/ese/en/index.jsp

Previously:
December 20, 2006  Web Services for Learning Object Repositories

January 30, 2007

report on Library Future Roles webcast

I liked the EDUCAUSE Webcast Architectures for Collaboration—Roles and Expectations for Digital LibrariesPeter Brantley (currently of CDL, soon to be Director of the Digital Library Federation) gave a very clear-eyed assessment of the current state of library IT engagement, and proposed many areas in which the library can do better.

I have to admire the boldness of a presentation whose 4th slide is "Libraries have failed in critical ways".

Digital-Library-Architecture-snap

He said that a key role that libraries have to embrace is to enable discovering books, and searching inside of them.

He had many well-constructed phrases, I particularly liked "technology wipes out our current understanding of how libraries engage the world".

He then moved on to explore the many roles that libraries can still play, both in the physical and digital worlds (including Second Life).

I asked a rather long question about the DLF Architectures group, their DLF Service Framework, library SOA and relationships to the JISC e-Framework.

In his response (and several other times in the presentation) he highlighted the DLF Aquifer project.

If I understand it correctly, it's about building frameworks for discovery and use of distributed digital resources.

We support scholarly discovery and access by:

    * Developing schemas, protocols and communities of practice to make digital content available to scholars and students where they do their work
    * Developing the best possible systems for finding, identifying and using digital resources in context

When asked about new DLF initiatives he has planned, he indicated a few key areas:

  • mass digitization
  • policy issues (e.g. rights, privacy)
  • new virtual communities
  • new media and discourse

He had a lot of ideas about engaging with faculty and students, as well as making better connections between the library and IT, and between libraries and publishing.

In particular, he said that libraries should explore the community aspect of dealing with faculty and students - libraries as enablers of community.

In terms of IT engagement he said libraries need to move together with IT to enable deep information discovery.

He also talked about ensuring that we take full advantage of the potential of virtual worlds, and not artificially constrain ourselves to old ways of operating in them - new libraries in new worlds, or to paraphase: No OPAC IN SECOND LIFE!

He said we need to rethink social learning, recognizing that virtual spaces are social spaces with the added aspect that they are freed from normal physical constraints.

He also talked about investigating new social media, having the library as a richer and more imaginative participant in the digital publishing process.  Again a paraphrase: "imagine what the book looks like in an interactive, networked future."

He said libraries need to re-engineer scholarly communication with faculty and with the public.

He mentioned a critical task, which is that we as libraries need to learn how to get our content harvested and ranked appropriately by search engines, perhaps by working more closely with the companies making such engines.

He identified "Collaborative problems, collaborative solutions":

  • massively distributed information
  • rich data
  • new indexing architectures
  • mining and mapping our data to build interactive linkages
  • challenge of providing ubiquitous access

Successful digital libraries bridge communities, build new services, and help others discover new services to build.

Digital Libraries are the architects of collaboration.

Publishing should be increasingly online and interactive, there should be digital workflows.

Challenge of "open access" to our applications - challenge for IT and libraries.
He talked a lot about how to permit remix, mashups, and combinations/

Of course my solution to that challenge is to ensure that libraries are building on a good service-oriented framework.  I very much hope that the DLF will further pursue SOA initiatives under his guidance.

Thanks to Glen for pointing out this webcast.

In case it's not clear, you can hear the audio of the presentation (which is essential, as the slides are just very high-level talking points) by following the instructions on how to log in to the "HorizonWimba archive" on the presentation page.  Once you're in the interface, switch to the Archives tab and scroll way way down to the bottom.  (Unfortunately, I currently get 503: Service Unavailable from their QuickTime server.)

UPDATE 2007-01-31: I was able to play the QuickTime audio on both my home Mac and home Windows box, with Firefox.  On the Mac the slides didn't display though.  It would be nice if EDUCAUSE offered just the audio as an MP3 / podcast.

September 28, 2006

Universidad Complutense de Madrid joins Google Books Project

La Biblioteca de la Universidad Complutense de Madrid y Google han firmado un acuerdo de cooperación para digitalizar la totalidad de las colecciones de la Biblioteca Complutense libres de derechos de autor. Se obtendrán copias digitales de estas obras que podrán ser recuperadas libremente desde Google (buscando en el texto completo) y desde el catálogo de la Biblioteca.

Proyecto de digitalización Biblioteca Complutense-Google

Inside Google Book Search (blog) - Madrid's Complutense University opens its library to the world

Browsing the library stacks at the University Complutense of Madrid is like taking a trip through the great moments of Spanish and Latin American literature: Miguel de Cervantes, Quevedo, Calderón, Sor Juana de la Cruz, Garcilaso de la Vega.

Interesting that they don't mention the already-existing Biblioteca Virtual Miguel de Cervantes (Digital Library Miguel de Cervantes).

September 27, 2006

ECDL 2006 and DLSci06 proceedings

The proceedings (PDF) from the workshop Digital Library Goes e-Science (DLSci06) are now freely available online.

As well, the proceedings from the European Conference on Research and Advanced Technology for Digital Libraries, 2006 (ECDL 2006) are available in paper or digital format, but not for free.  You will need a Springer subscription.

Julio Gonzalo, Costantino Thanos, M. Felisa Verdejo, Rafael C. Carrasco (Eds.): Research and Advanced Technology for Digital Libraries, 10th European Conference, ECDL 2006, Alicante, Spain, September 17-22, 2006, Proceedings. Lecture Notes in Computer Science 4172 Springer 2006, ISBN 3-540-44636-2

The Digital Bibliography & Library Project (DBLP) provides handy links to the individual articles, which are available (if you have licensed access) through SpringerLink. See their page on ECDL 2006 proceedings, or their links for all previous ECDL conference proceedings.

The presentations (i.e. the PowerPoint files) are not online yet; I don't know when they will be.

September 24, 2006

ECDL 2006 - DLSci06 - DILIGENT grid-based DL

Leonardo Candela, Donatella Castelli, Christoph Langguth, Pasquale Pagano, Heiko Schuldt, Manuele Simi and Laura Voicu
On-Demand Service Deployment and Process Support in e-Science DLs: the DILIGENT Experience

http://www.diligentproject.org/

(Also see notes from tutorial Distributed Infrastructures for Digital Libraries.)

Motivation

* research is multidisciplinary and co-operative effort
* may use a virtual resource organization that doesn't last a long time or have DL expertise
* but the DL is an important tool

DELOS view: from DL to Knowledge Commons
* from content-centric to person-centric
* from info storage to communication and collaboration support
* from centrally-located text to distributed and heterogenous data sources

New DL development model
* DL built by dynamically aggregating the needed resources
* new functionality combined in user-defined workflows

Service-Oriented Architecture over Grid Framework

[complex diagram]

* New functionality delivered by workflows of services

Services Overview

* Mediation
* Information Space Management
* Access
* User and Resource Management
* Presentation
  - user-oriented access point to the DL
  - plug and play community-specific tools
  - [something about JSR168 portlets and other stuff]
* Enabling
  - monitoring, other operations domain functions

Service Detail

* the Keeper Service
  - deploy and monitor user-defined virtual DLs

* the Information Service
  - gathers, stores and supplies information about the resources constituting DILIGENT
    and needed to the other services
  - XML-based resource profiles
  - push and pull modalities i.e. query and subscribe/notify

Process Design and Validation

Implement complex services by combining existing ones, a.k.a. "programming in the large"
* control flow
* data flow
* transaction behavior and execution guartees for concurrency and failure handling
* XML, SOAP and WSDL as technologies
* BPEL as a foundation for process specification

[diagram of tool]

Process Execution

* built on top of OSIRIS
* runs BPEL tasks?

ECDL 2006 - DLSci06 - TextGrid

Andreas Aschenbrenner, Peter Gietz, Marc Wilhelm Küster, Christoph Ludwig and Heike Neuroth
TextGrid - a modular platform for collaborative textual editing

http://www.textgrid.de/

BMBF e-Science Programme

http://www.bmbf.de/de/298.php

* 2005-2009
* 100 institutions
* 100 million Euros

* focuses
  - elearning
  - d-grid
  - knowledge networking

TextGrid - a community grid for the humanities

combined metadata, annotations, etc. on German documents

Workflow

[diagram]

Planning - Digitisation - Annotation - Publication

demo of Anjo Anjewierden's work

TextGrid goal is to build

* an open adaptable extensible infrastructure

4 layers of TextGrid

* tools
* services
* middleware
* Resources (texts)

Tools

both a grid-connected portal, and a portable rich client

Text Archives

Services

* standard-based, modular, open
* Web Services: registration, workflow, exposure

Middleware

* Data grid
* Service grid
* Information Services
* AAI - GridShib

His question:
* where can we find common components?

Q: Can we see the rich client?
A: Not yet, we plan to build it on Eclipse

September 23, 2006

ECDL 2006 - DLSci06 - Provenance Explorer

Kwok Cheung and Jane Hunter
Provenance Explorer - A Tool for Viewing Provenance Trails and Constructing Scientific Publication Packages

* Modelling scientific discovery process
* Tools for constructing and publishing scientific models

Scientific Publishing - increasing demands to
* publish raw and derivative data
* document precise provencance
* share data models
* enble duplication and validation
* protect IP
* facilitate training / learning objects

Huge number of elements you should capture to enable reproduction of experiments

Huge Knowledge Management Challenges
* very large data set
* distributed data sets
* etc.

Need to capture the the provenance

* may display/share various levels of detail
* selective archiving

Lineage data/metadata

* workflow is prospective
* provenance is retrospective

Capture the Semantic Descriptions of components

FUSION project

http://metadata.net/sunago/fusion.htm

ontologies: ABC, MPEG-7, FUSION, OME

Image labelling

* Rules-by-example
  - then you can automatically generate semantics

Based on built-up semantics, you can generate automatic displays.

Hypothesis testing interface

Also, can specify new areas of interest e.g. by identifying sparse areas in the 3D display of results.
This can automatically generate parameters for new experiments.

Extended Harmony ABC model for experiments.

http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Lagoze/

Result - extended ABC

Then: Modelling eScience Provenance

[various screenshots of the application]

Architecture

* JGraph
* Algernon inference engine
* Protege OWL
* Jena semantic web framework

Working on

* Scientific Model/Publication construction tools
* Search, Browse and Retrieval

[diagram of architecture]

Preservation of Composite Objects
* Use RDF/XML to package metadata
* Maintain preservation for both
  - composite objects
  - atomic objects

PANIC Architecture

PANIC (Preservation webservices Architecture for Newmedia and Interactive Collections)

http://metadata.net/panic/

AONS

* Automated Obsolescence Notification Service

* Collaboration between
  - UQ
  - NLA
  - ANU?

September 22, 2006

ECDL 2006 - DLSci06 - eSciDoc

eSciDoc - a Scholarly Information and Communication Platform in the Age of eScience
Matthias Razum

This was a very good presentation.  I particularly appreciated the silo diagram.

[IMG_9798_crop.jpg]

http://www.escidoc-project.de/homepage.html

Scholarly Communication - Rip, Mix, Burn

* Scholarship inherently recursive
* Therefore scholars are both information consumers and information producers at the same time
* Referencing or reusing material of all times enables scholars o weave a knowledge network of related information objects
* Good scientific practice requires provenance data for objects and versioning

Example: Knowledge Network around cuneiform tablet

* Metadata
* Transcription - Metadata
  - Translation - Metadata
  - Annotation

[wonderful silo diagram]

What should an institutional repository be?

* institutional memory
* allow for reuse - allow for associating information objects in novel contexts
* support interdisciplinary work
* open, application-independent and flexible,
  thus laying the ground today for repurposing the information in future applications

Turning Static Objects into Living Knowledge

* e-Scholarship ( = e-science = e-research ) allows to publish all intermeidate results of knowledge
  generation from first ideas, theories, discussions with peers to final results
* need to support users early in their work process, so they can share immediately with peers
* leads to interactive authoring environments with support for collaboration and annotations
* objects lose their static nature and become 'active nodes' in a network of knowledge

Q (Rachel): How to connect from a wiki to an institutional repository?
A: Maybe it is possible.  There was a project - NSDL - wiki based on Fedora.

eSciDoc
* 6 million euro five-year grant (2004-2009)
* aim to build an integrated information, communication and publishing platform for web-based
  scientific work
* NOT a research project, aims at stablishing an innovative production system

[diagram]

* Repository at the core
* layer of services
* layer of security
* build apps on top of these
  - publication management
  - scholarly workbench
  - eLib
  - eLab Journal

  helper apps: user management

ideally, the scientists should be able to build apps on this platform

Publication Management

* Workflows
  - very complex - every institution different
* Metadata
  - everyone has their own idea what the correct scheme is

Scholarly Workbench

* collaboration, for humanities

eLib

* dark archive of all commercially-licensed content
(postponed)

eLabJournal

(postponed)

Services
* Object Manager
* Content Type Modeler
* Metadata Modeler
* Formats Manager
* Workflow Manager
* Data Interopability
* Search & Browse
* License Management
* Personalization
* Basket Manager

Q (me): To what extent does it depend on Fedora?
A: In theory the services deal with the repository as an abstract layer, but in practice currently
   you must have Fedora

Framework Services

* The framework is an enabling technology
  - scholars can focus on domain-specific application logic
  - enable scholars to focus on their "business logic" / "scholarly logic"

Content Model

Q (Rachel): Are you using RDF containers?
A: Conceptually we are using METS containers.
Comment (Jane): Fedora uses RDF internally.

* some generic object patterns

* ability to attach licenses to content items

* ability to manage content item versions

September 21, 2006

ECDL 2006 - DLSci06 - UQueensland eResearch projects

Jane Hunter
Next Generation Digital Library Services for eScience - Delivering on the Hype
U Queensland

http://www.itee.uq.edu.au/~jane/

http://www.itee.uq.edu.au/~eresearch/

[IMG_9799_crop.jpg]

Big List of eScience Middleware

DART project

http://www.itee.uq.edu.au/~eresearch/projects/dart/

[list of work packages]

* Secure annotation e.g. with Shibboleth?
* Integration of Fedora and Storage Research Broker (SRB)

* Improving search interfaces

Case Studies

* microscopists
* protein crystallographers
* environmental/ecosystem scientists
* VIRGIL project

GRANI project for Nanostructural Analysis Network Organization

http://www.itee.uq.edu.au/~eresearch/projects/grani/

Challenges

[huge list]

eScience Workflow

http://kepler-project.org/

Kepler
BPEL4WS

Modelling eScience Provenance

Tools for scientific provenance, with different levels of views.

Scientific Model Package - everything wrapped up into data + publication ***

Stuff about teleobservation of experiments

DART - Annotation of Crystallographic Structures

You could almost automate the creation of a paper about a new structure - by integration with data.

[demo of advanced annotation of 3D crystallographic data structure]

Semantic WildNet

* integrate bird/snake sightings with climate sensor data with topographic data
* SPARQL query interface + Google Earth

Collaborative Tools

* Chat
* Videoconf
  - Access Grid Sessions
* Wikis, Blogs
* Shared Applications

Vannotea

http://www.itee.uq.edu.au/~eresearch/projects/vannotea/

Collaborative Multimedia Annotation
3D Object Annotation

Secure Annotation e.g. on eprints server

The annotation server is a sidebar for a browser

Q (me): What about preservation
A: We have a project that is addressing that

PANIC (Preservation webservices Architecture for Newmedia and Interactive Collections)

http://metadata.net/panic/

Q: Will this content be reusable?  The tools are very domain-specific.
A: Very aware of this issue.  Have already demonstrated reuse of content.  Developing core services.

Q: What are some shared pieces of infrastructure?

A: Workflow
Realtime Shared App
Data integration
E-lab notebooks

ECDL 2006 - DLSci06 - eScience Knowledge-based vision

Claudia Niederee, Thomas Risse
e-Science: A knowledge based vision

http://www.ipsi.fraunhofer.de/

e-Science in Transition

* a broader look now beyond just grid and storage

Digitial Libraries

* have meanwhile reached a certain maturity

Possibly synergies

* E-science funcitonality as a natural extension of the scientific digital library
* learn from DL experiences and best practices
* reduce the risk of "re-inventing the wheel"

Comments: Grid not a good match for libraries - collaboration is key, where we can participate.

Goal: scientific working place of the future that enables focusing on creative tasks -   
      knowledge-based e-Science infrastructure

Future Directions:
* From digital content to innovation resources (Resources)
* From document classification to domain and market understanding (Context)
* From collaboration tools to virtual teams (Collaboration)
* From digital libraries to virtual research environments (Interaction)
  - also intelligent services that take over routine tasks from the research
* From provision of documents to active support of creativity (Creativity)
  - extremely difficult to support with info. technology
* From information provision to support of the innovation process (Process)

Have to have lots of flexibility to support creativity.

Example: Systematic Re-use of Innovation Resources

Innovation Resources
* Tools and Services
* Expertise
* Scientific data
* Scientific documents
* Methods

Challenges
* Encourage the re-use of resources
* Discover adequate resources within the current working context
* Consideration of existing rights of use
* Automate the description of resources where possible

Rachel - citations and linking to data

* Establishment of an annotation pipeline for the (semi) automatic enrichment of resources
  (scientific data and services)

From data to enriched data (by describing the data with the programs, methods and parameters used to create it).

Needed: Pipeline for (semi) automatic enrichment - domain specific
        Ontologies and models for the description
        Extended editors for data integration

e-Science Architecture Blueprint

[diagram]

FRESCO = Fraunhofer e-Science Cockpit

http://www.ipsi.fraunhofer.de/i-info/en/content/view/97/0/

Fraunhofer is very large, distributed organization - 58 institutes, 12400 employees

Fraunhofer e-Science Vision

* Support of the scientific innovation process for applied research by providing
[long list of things]

* Seamless and traceable integration of scientific data into the publication process

* Development of the Fraunhofer e-Science Infrastructure

Idea of e-Science Cockpit: Navigation within the innovation space

Infrastructure
* Integration basis for tech with content
* Provide base services and standardized interfaces
* Scalable and extensible

Did a big study with a questionnaire, as part of strongly user-oriented technology design

Found a high readiness for sharing data

Approach:
- Adequate technology launch
  - Stepwise integration into the working process

ECDL 2006 - Digital Library eScience programme

The preliminary programme for the Digital Library eScience workshop is available,
which is good, since it is about to start in a few minutes.

I will be blogging the presentations as much as possible.
Category DLSci06.

Program (Preliminary)

    09:00 - 10:30

  • Claudia Niederee, Thomas Risse
    e-Science: A knowledge based vision
  • Jane Hunter
    Next Generation Digital Library Services for eScience - Delivering on the Hype

10:30 - 11:00

  • Coffee break

11:00 - 13:00

  • Matthias Razum
    eSciDoc - A Scholarly Information and Communication Platform in the Age of eScience
  • Kwok Cheung and Jane Hunter
    Provenance Explorer - A Tool for Viewing Provenance Trails and Constructing Scientific Publication Packages
  • Andreas Aschenbrenner, Peter Gietz, Marc Wilhelm Küster, Christoph Ludwig and Heike Neuroth
    TextGrid - a modular platform for collaborative textual editing

13:00 - 14:30

  • Lunch break

14:30 - 16:30

  • Leonardo Candela, Donatella Castelli, Christoph Langguth, Pasquale Pagano, Heiko Schuldt, Manuele Simi and Laura Voicu
    On-Demand Service Deployment and Process Support in e-Science DLs: the DILIGENT Experience
  • Bhaskar Mehta and Peter Fankhauser
    To Grid or not to Grid: Digital libraries based on Grid infrastructure
  • Rachel Heery
    Concluding Summary
  • Discussions

September 20, 2006

ECDL 2006 - Keynote 3 - Google Books project at Stanford

(Withdrawn by request of Michael A. Keller)

ECDL 2006 - Session 10 - Next Generation Million Book Digital Libraries

Tuesday September 19, 2006
17:30 "Beyond Digital Incunabula: Modeling the Next Generation of Digital Libraries". Gregory Crane

This was an interesting presentation about the many ways in which we could mobilize digitized books - analyze and link the full-text in many ways.  A digital book should become much more than just a static PDF online - it should participate actively in a network of information.

http://www.perseus.tufts.edu/

http://dlib.anu.edu.au/dlib/march06/crane/03crane.html

http://ase.tufts.edu/faculty-guide/faculty.asp?id=gcrane

Separation of Content and Presentation

* extract chunks via XML

Recombinant Data

* Disassemble documents into pieces
* Recombine them on the fly

Dynamic Data

Books Talking to Each Other

Hybrid entty

Human/Machine/Services

Automatic Processes

e.g. Named Entity Analysis - figure out context of references to "Washington"

Lexical Analysis

* doing analysis of mapping between language and its translations

New User Interactions

* Readers talk, books listen

* Personalization

Million Book Libraries

* Google Books
* Open Content Alliance
* i2010 - in planning

Compared to curated

* 10 times bigger
* 10 times more noise
* etc.

Technologies and Domains

* Three core technologies
  - page image to text
  - text to data
  - one language to another

Million Books Workshop (to be announced)

Boston USA
May 22-24, 2007

September 19, 2006

ECDL 2006 - keynote 2 - Yahoo determining intentionality from queries

11:00 Keynote 2

Queries and Clicks as a Source of Knowledge

Ricardo Baeza-Yates
Yahoo Research
Barcelona, Spain & Santiago, Chile

[IMG_9747.JPG]

http://www.dcc.uchile.cl/~rbaeza/

web (chaotic) - DL (ordered)

[overview of Yahoo sites, amount of info, types of data]

"Information Games" - win if you match tags
http://www.espgames.org/

Observed Data

* Query Logs
* Result/Web Clicks
* Advertising clicks
* Social

Talks about "safe" (trusted) vs "dangerous" (false, spammy) information/sources.

The Wisdom of Crowds

The Power of Social Media

Motivations for Web Mining

* The Dream of the Semantic Web
  - Obstacle: Us

* User Actions: Implicit Semantic Information
  - free
  - large volume
  - unbiased
  - can we capture it?
  - hypothesis: Queries are the best source

Mining Queries for...

* Improved Web Search
* User Driven Design
  - Information Scent
  - web site that users want
  - web site hat you should have
  - imprve content & structure
* Bootstrap of pseudo-semantic resources

Web Queries

* short queries & impatient interaction
* smaller and different vocab
* different user goals (Broder, 2000)
  - information need
  - navigational need
  - transactional need
* Refined by Rose & Levinson, WWW 2004

http://citeseer.ist.psu.edu/rose04understanding.html

Yahoo Mindset

http://mindset.research.yahoo.com/

Relevance of the Context

* moving to less information, more context

Context

* Who you are
* Where you are
* What you are doing

* Issues: privacy ...
* Sources: Web, CV, usage logs...
* Goals: personalization, localization, better ranking in general...

Context in Web Queries

* IP, time, location (based on IP), interaction history, task, OS, browser...

User Intention

* Kang & Kim, SIGIR 2003
  - their method was not effective: 60%

* Liu, Lee & Cho WWW 2005
  - prediction power 90%

Yahoo

* Manual classification of more than 6000 popular queries
  - query intention and topic
  - classification and clusering
  - machine learning

* Baeza-Yates ? 2006?

Results: You can do (machine learning?) classification of intention on information queries

Next step: Clustering Queries

* Define relations among querys
  - common words
  - common clicked URLs: works better = natural clusters
* define distance functon among queries

Yahoo Approach

* Can we cluster queries well?
* Can we assign user goals to queries?

[details of method]

The user queries represent in a way the user view of your data/system.

Uses

* Improved ranking
* Word classification
  - e.g. synonyms in the same cluster
* Query recommendation (ranking of suggested queries)

Building Taxonomies

* Infer topics for queries that imply documents

Result: Automatic classification is better than (single or a small group of) humans! (but/because the auto classification is based on the actions of many many people)

Final Remarks

* Many potential uses of the wisdom of people

Q: Compare user pseudo-taxonomies to taxonomies autogenerated from text
A: probably they are quite different - people use different words (or a different majority set of words) in queries than they do in written text

Q (Rachel Heery UKOLN): can this pick up new terms (new words) being used?
A: working on this

September 18, 2006

ECDL 2006 - tutorial - Fedora

14:30

Tutorial 5
Fedora
Carl Lagoze and Sandy Payette

[raw presentation notes]

Also see my notes from last year's European Fedora User Meeting.

They are tagged under EuroFedora2005.

* Intro
* Digital Objects
* Repository Service
* 15:45 Service Framework in Focus
=== Break ===
* 16:00 Semantic Web and RDF
* RDF and Fedora
* Case Study: NSDL
* Future Directions

Problem Space

* Complex, compound, dynamic objects

Theme

* From Documents to Integrated Information Networks

"a network-based scholarly communications system"

Repositories situated at intersection of key social and technical trends

Technical Context:
* SOA
* Web 2.0
* Semantic Web

Sampling of Fedora Community

* ARROW and DART - Australia
*** eSciDoc - Max Planck
* DRC - OhioLink
* Danish Technical University (DTU)
* Wegener Institute, Polar/Marine, Germany

Are there IR clients for Fedora?

** Fez http://sourceforge.net/projects/fez "DSpace plus"
* VALET http://www.valet.vtls.com/
* Elated http://elated.sourceforge.net/
* FIRE

Digital Object Model

Disseminators - metadata about things you can use [my summary] - a disseminator can tell Fedora how
to connect to a Web Service, but a disseminator is not a Web Service itself

Fedora Repository Service

* Exposes through SOAP and REST:
  - Manage (Ingest, Export, Validate, Version)
  - Access (Get)
  - Registry Search
  - Resource Index

Fedora Security Architecture

[various stuff]
* Shibboleth-to-Fedora servlet filter

XACML Policy

Preservation Support

[various stuff]
* Preservation Support Services (forthcoming 2006-2007) - being defined by working group

There are some performance limitations in triple-stores
- Fedora uses the Kowari triple-store
- NSDL is storing ~200 million triples?

Fedora and the Semantic Web

Motivation

* exposing repository as a network of objects
  - relationships
  - query the graph; discovery of related stuff
* indexing based on generalizable data model
* extensible enrichment of object descriptions
* inferencing from structure of graph

What are the applications?

* Digital libraries with structured objects
* Publishing Systems
  - Journals with Articles
* eScience
  - Text with Datasets
* Semantic networks

[There was more but I switched back to the other workshop.]

== end ==

ECDL 2006 - tutorial - Distributed Infrastructures for Digital Libraries

Sunday September 17, 2006
09:30
Tutorial 4
Distributed Infrastructures for Digital Libraries

I liked this workshop a lot, I hadn't really thought about DLs in relationship to P2P and Grid, and I liked the idea that it may be possible to use aspects of all of them.

[raw presentation notes]

Start

Digital library mediates between community and content.

* Core functionality of a DL is well-understood
* Standards have been estabished
* DL management systems are in operation e.g. DSpace, OpenDLib

Evolution of DL

* wider clientele including scientific collaboration
* competing technologies including web search engines
* technology change: grid, p2p, SOA, Semantic Web
* new types of content including blogs, dynamic content, scientific data

Scenario 1 - identifying archeological [] discoveries made by the public
* traditional way is very complex and time-consuming for the archeologist []

Scenario 2 - Environmental Incidents
* can you set up a virtual DL on demand, including all needed data and simulations?

Next Gen DL (NGDL)
* dynamic configurable federation

The future of DLs - services

Specialized services
* Search
  - Different media types
  - Content-based
  - Multi-object, multi-feature
  - Multilingual access
  - Relevance feedback
* Indexing
* Annotation
* Metadata management
* Content management
* Resource management

Requirements:

Virtual DL
* Easy to extend
* Example: collaboration in eScience applications

Management of services which are:
* Distributed
* Heterogenous
* [?]

Composition of services
* Defining complex services / processes / workflows
* Flexibility
* Example: complex processes for automated storage and replication of data, generation of meta data (content features)

More
* Personalization
* Visualization
* Access on mobile devices
* Context- and location-aware services
* AuthN an AuthZ
* "High availability" [quotes mine] - Access anytime - replication
* Reliability
* Scalability
* etc-ability [my comment]
* Dynamic / Continuously generated data
  - example: data generated by certain instruments in eScience

Q (me): How / where to work on standarding service interfaces, so that we can
incorporate them into workflows?

A: will be covered in Web Services presentation

Underlying Technologies and their Promises
Thomas Risse ?

Service-Oriented Architectures (SOA)

Web Service Model

Service Provider publishes service description to Service Broker
Service Requestor requests service list from Service Broker
Service Requester then binds to Service Provider

Elements of SOA

[diagram I don't agree with]

Web Services Stack

[diagram]

WS-BPEL "already widely used, lots of applications"

* Above BPEL - Coordination (WS-Coordination, WS-AtomicTransaction, WS-Notification...) [not mature]
* Security
* Management (WSDM)
* Contracting (trading partner agreement - paper contract)

Challenge: Semantic standards are still in development

Grid Computing: An Application of SOA

Software: Globus Toolkit...

[Globus architecture diagram]

which leads to... Open Grid Service Architecture (OGSA)

Idea: Service orientation to virtualize resources
* Extended Web Services -> Grid Services - Web Service Resource Framework (WSRF)

[OGSA Architecture diagram]

WSRF

* Unified way to model and interact with stateful Web Services

Peer-to-Peer Computing

Summary

== ==

[Overview of Motivation and the three projects]

Web Services and Distributed DL Infrastructures

Challenges
* granularity of services
* Semantics of services

BRICKS, Diligent and DELOS

BRICKS
* transparent access to distributed available information sources
* retrieval of info with knowledge support
* multi-lingual
* easy
* platform independent

BRICKS Approach

* SOA
* Decentralized
* Open Source

BRICKS Node (BNode)

P2P network of BNodes

Budget: 12.2 million Euros
project nearing completion ?
http://www.brickscommunity.org/

The Diligent Project

Diligent develops a DL test-bed infrastructure that allows virtual organizations to create
on-demand DL including computing, storage, multi-content support, app resources.

use a grid computing infrastructure for resource allocation

Key Concepts

* SOA
* Integrating DL services on infrastructure from Enabling Grids for E-sciencE project (EGEE)
* Enhances existing grid services with complex service interactions required to build, operate and
  maintain transient virtual digital libraries

Diligent Architecture

[diagram]

Budget: 9.55 million Euros
http://www.diligentproject.org/

The DELOS Project

Network of Excellence to coordinate the development of next generation digital libraries.

[huge project]

* Reference Model for DLs
* Architectures

Delos DLMS

* Specialized DL functionality from DELOS and non-DELOS partners is made available and integrated by means
  of (Web) services

OSIRIS: Integration of Services

http://www.delos.info/

Q (me): So there are three frameworks, which one do I use?  All three?
A: Err, yes.  Still in early stages.

== left ==

There was info on how the systems do content and metadata management, while
I was at another tutorial.

== back ==

BRICKS

* Personalization

User profiles add customization on top of e.g. Content, Presentation, Services, Interaction.

Model the User

* many aspects
  - will focus on preferences

[a lot of details about how to capture user interest in e.g. keywords and their relationships]

Pesonalization Approaches

Recommenders: Content-based Recommenders

- "find me things that are related to things I have liked in the past"

Recommenders: Collaborative Filtering

- find objects similar PEOPLE have liked

What BRICK? provides is Personalized Search

e.g. Java = coffee vs. Java = programming language

Architecture

Query -> Query Personalization -> Result Ranking

Query Personalization

* dynamic enhancement of query using preferences from a user profile

Personalization in BRICKS

[diagram of Personalization Manager interacting with many foundational bricks]

User Model

* Term preferencecs (e.g. Pisa)
* Physical collection
* Ontology classses
* Attribute (e.g. author="Plato")

All with vectors of terms.

User Profiling

* Transparent user-action tracking and profile update

Currently support

* query personalization
* results ranking

Profiles are local and all processing is local.
The profiles are not distributed.  (Eventually they will be.)

DELOS Search [and Annotations]

* Image similarity
* Audio retrieval
* Video retrieval
* 3D retrieval

Visualization and Interfaces

Collections, Annotations, Personalization and Search in DILIGENT

* Collection is the ultimate source for search
* Physical and Virtual collections (virtual - constructed by query)

Personalization

* Manage and maintain user profiles
* Query personalization

currently there is only manual assignment of profiles,
eventually there will be user behavior tracking

The DILIGENT Search Engine

* Composed by a distributed set of services communicating via WS-* protocols being
  orchestrated under a master component

... [lots of detail]

Query execution plans are ultimately converted to Process Execution Engine (PES) compliant
workflows (BPEL4WS).

Workflow is submitted to PES

[very complex queries are supported]

Lessons Learned

* big complex subject with ongoing investigations'

General Methodology Lessons

* Communication of the concept of distributed architectures to the end user is a hard process
* Early prototyping is helpful
* All three technologies (SOA, P2P, Grid) and their implementations are still evolving
* Developers and system designers have
  - a long learning curve
  - difficulties in implementing a DL on top of these three technologies

When you work with librarians, they can't dream hard enough = they don't know what is possible.
Technologies have to work hard with librarians to make them understand what can be done.

General Tech Lessons

* The three technologies are not orthogonal
* All three are relevant to future DLs
* No tech can meet the requirements of a DL alone
* But you don't need all technologies always for everything - select as appropriate

Specific Tech Lessons - SOA

* Important
  - functionality can be combined across network borders
* A service-oriented design is different from traditional design
  - e.g. the number of functions should be limited, due to [various] considerations
  - service invocations are NOT like LAN remote procedure calls
* Communication costs between services are often underestimated

Tech Lessons - P2P

* Beneficial when you have a lot of data
  - supports redudancy
* Increased complexity in content security, access control, ...

Tech Lessons - Grid

* It gives you high computing power
* Critical for supporting computationally-intensive tasks
* Beneficial when you have a lot of data
  - supports redudancy
* Not really targetting interactive or real-time applications but rather batch,
  long-running processing tasks
* Fine-grained security complicates planning query execution
* (Provides a) standardized mechanism for on-the-spot processing and exchanging large
  amounts of XML data
* Despite its current shortcomings, WSRF offers an elegant substrate for building
  a dynamic distributed system based on standards
* Failure to carefully plan complex workflows ... might lead to ... execution failures
* Porting database concepts and workflow to DLs opens opportunities for distributed information retrieval

[presented by Yannis from University of Athens]

Conclusions and Open Issues

Currently...

[diagram of current digital library silos]

Content-centric, static storage, isolated, environment-specific, isolated and repeated efforts

Myths

DL for library only
DL for cultural heritage only

Future Development Methodology

[diagram of shared generic DL stuff]

* Generic DLMS tech to build on

Future DL

* Person-centric
* Targeted fr active communication/collaboration
* Global distributed interacting systems
* Generic DLMS to build on
* "All" applications (not just libraries and cultural heritage)

Develop from top-down, from the user interface / user requirements down
Interop between DLs

Conclusions

* Still early in the game
* These systems can serve as first versions / early protoypes - they are not ready for industrial production

Open Issues

* Grand unified theory of SOA, P2P and Grid
* Grand unification of BRICKS, Delos, DILIGENT...
* Standards
* Tons of research issues
  - distributed search and workflow optimization
  - distributed information fusion
  - intelligent caching of information and processing state
  - intelligent placement and replication of information and processing
  - ...

* two more projects
  - BELIEF - how to exploit "knowledge infrastructure"
  - DRIVER - distributed access to international repositories

Previously:
January 15, 2005  DELOS - digital library architecture

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004