« live video from ECDL 2006 | Main | ECDL 2006 - tutorial - Fedora »

September 18, 2006

ECDL 2006 - tutorial - Distributed Infrastructures for Digital Libraries

Sunday September 17, 2006
09:30
Tutorial 4
Distributed Infrastructures for Digital Libraries

I liked this workshop a lot, I hadn't really thought about DLs in relationship to P2P and Grid, and I liked the idea that it may be possible to use aspects of all of them.

[raw presentation notes]

Start

Digital library mediates between community and content.

* Core functionality of a DL is well-understood
* Standards have been estabished
* DL management systems are in operation e.g. DSpace, OpenDLib

Evolution of DL

* wider clientele including scientific collaboration
* competing technologies including web search engines
* technology change: grid, p2p, SOA, Semantic Web
* new types of content including blogs, dynamic content, scientific data

Scenario 1 - identifying archeological [] discoveries made by the public
* traditional way is very complex and time-consuming for the archeologist []

Scenario 2 - Environmental Incidents
* can you set up a virtual DL on demand, including all needed data and simulations?

Next Gen DL (NGDL)
* dynamic configurable federation

The future of DLs - services

Specialized services
* Search
  - Different media types
  - Content-based
  - Multi-object, multi-feature
  - Multilingual access
  - Relevance feedback
* Indexing
* Annotation
* Metadata management
* Content management
* Resource management

Requirements:

Virtual DL
* Easy to extend
* Example: collaboration in eScience applications

Management of services which are:
* Distributed
* Heterogenous
* [?]

Composition of services
* Defining complex services / processes / workflows
* Flexibility
* Example: complex processes for automated storage and replication of data, generation of meta data (content features)

More
* Personalization
* Visualization
* Access on mobile devices
* Context- and location-aware services
* AuthN an AuthZ
* "High availability" [quotes mine] - Access anytime - replication
* Reliability
* Scalability
* etc-ability [my comment]
* Dynamic / Continuously generated data
  - example: data generated by certain instruments in eScience

Q (me): How / where to work on standarding service interfaces, so that we can
incorporate them into workflows?

A: will be covered in Web Services presentation

Underlying Technologies and their Promises
Thomas Risse ?

Service-Oriented Architectures (SOA)

Web Service Model

Service Provider publishes service description to Service Broker
Service Requestor requests service list from Service Broker
Service Requester then binds to Service Provider

Elements of SOA

[diagram I don't agree with]

Web Services Stack

[diagram]

WS-BPEL "already widely used, lots of applications"

* Above BPEL - Coordination (WS-Coordination, WS-AtomicTransaction, WS-Notification...) [not mature]
* Security
* Management (WSDM)
* Contracting (trading partner agreement - paper contract)

Challenge: Semantic standards are still in development

Grid Computing: An Application of SOA

Software: Globus Toolkit...

[Globus architecture diagram]

which leads to... Open Grid Service Architecture (OGSA)

Idea: Service orientation to virtualize resources
* Extended Web Services -> Grid Services - Web Service Resource Framework (WSRF)

[OGSA Architecture diagram]

WSRF

* Unified way to model and interact with stateful Web Services

Peer-to-Peer Computing

Summary

== ==

[Overview of Motivation and the three projects]

Web Services and Distributed DL Infrastructures

Challenges
* granularity of services
* Semantics of services

BRICKS, Diligent and DELOS

BRICKS
* transparent access to distributed available information sources
* retrieval of info with knowledge support
* multi-lingual
* easy
* platform independent

BRICKS Approach

* SOA
* Decentralized
* Open Source

BRICKS Node (BNode)

P2P network of BNodes

Budget: 12.2 million Euros
project nearing completion ?
http://www.brickscommunity.org/

The Diligent Project

Diligent develops a DL test-bed infrastructure that allows virtual organizations to create
on-demand DL including computing, storage, multi-content support, app resources.

use a grid computing infrastructure for resource allocation

Key Concepts

* SOA
* Integrating DL services on infrastructure from Enabling Grids for E-sciencE project (EGEE)
* Enhances existing grid services with complex service interactions required to build, operate and
  maintain transient virtual digital libraries

Diligent Architecture

[diagram]

Budget: 9.55 million Euros
http://www.diligentproject.org/

The DELOS Project

Network of Excellence to coordinate the development of next generation digital libraries.

[huge project]

* Reference Model for DLs
* Architectures

Delos DLMS

* Specialized DL functionality from DELOS and non-DELOS partners is made available and integrated by means
  of (Web) services

OSIRIS: Integration of Services

http://www.delos.info/

Q (me): So there are three frameworks, which one do I use?  All three?
A: Err, yes.  Still in early stages.

== left ==

There was info on how the systems do content and metadata management, while
I was at another tutorial.

== back ==

BRICKS

* Personalization

User profiles add customization on top of e.g. Content, Presentation, Services, Interaction.

Model the User

* many aspects
  - will focus on preferences

[a lot of details about how to capture user interest in e.g. keywords and their relationships]

Pesonalization Approaches

Recommenders: Content-based Recommenders

- "find me things that are related to things I have liked in the past"

Recommenders: Collaborative Filtering

- find objects similar PEOPLE have liked

What BRICK? provides is Personalized Search

e.g. Java = coffee vs. Java = programming language

Architecture

Query -> Query Personalization -> Result Ranking

Query Personalization

* dynamic enhancement of query using preferences from a user profile

Personalization in BRICKS

[diagram of Personalization Manager interacting with many foundational bricks]

User Model

* Term preferencecs (e.g. Pisa)
* Physical collection
* Ontology classses
* Attribute (e.g. author="Plato")

All with vectors of terms.

User Profiling

* Transparent user-action tracking and profile update

Currently support

* query personalization
* results ranking

Profiles are local and all processing is local.
The profiles are not distributed.  (Eventually they will be.)

DELOS Search [and Annotations]

* Image similarity
* Audio retrieval
* Video retrieval
* 3D retrieval

Visualization and Interfaces

Collections, Annotations, Personalization and Search in DILIGENT

* Collection is the ultimate source for search
* Physical and Virtual collections (virtual - constructed by query)

Personalization

* Manage and maintain user profiles
* Query personalization

currently there is only manual assignment of profiles,
eventually there will be user behavior tracking

The DILIGENT Search Engine

* Composed by a distributed set of services communicating via WS-* protocols being
  orchestrated under a master component

... [lots of detail]

Query execution plans are ultimately converted to Process Execution Engine (PES) compliant
workflows (BPEL4WS).

Workflow is submitted to PES

[very complex queries are supported]

Lessons Learned

* big complex subject with ongoing investigations'

General Methodology Lessons

* Communication of the concept of distributed architectures to the end user is a hard process
* Early prototyping is helpful
* All three technologies (SOA, P2P, Grid) and their implementations are still evolving
* Developers and system designers have
  - a long learning curve
  - difficulties in implementing a DL on top of these three technologies

When you work with librarians, they can't dream hard enough = they don't know what is possible.
Technologies have to work hard with librarians to make them understand what can be done.

General Tech Lessons

* The three technologies are not orthogonal
* All three are relevant to future DLs
* No tech can meet the requirements of a DL alone
* But you don't need all technologies always for everything - select as appropriate

Specific Tech Lessons - SOA

* Important
  - functionality can be combined across network borders
* A service-oriented design is different from traditional design
  - e.g. the number of functions should be limited, due to [various] considerations
  - service invocations are NOT like LAN remote procedure calls
* Communication costs between services are often underestimated

Tech Lessons - P2P

* Beneficial when you have a lot of data
  - supports redudancy
* Increased complexity in content security, access control, ...

Tech Lessons - Grid

* It gives you high computing power
* Critical for supporting computationally-intensive tasks
* Beneficial when you have a lot of data
  - supports redudancy
* Not really targetting interactive or real-time applications but rather batch,
  long-running processing tasks
* Fine-grained security complicates planning query execution
* (Provides a) standardized mechanism for on-the-spot processing and exchanging large
  amounts of XML data
* Despite its current shortcomings, WSRF offers an elegant substrate for building
  a dynamic distributed system based on standards
* Failure to carefully plan complex workflows ... might lead to ... execution failures
* Porting database concepts and workflow to DLs opens opportunities for distributed information retrieval

[presented by Yannis from University of Athens]

Conclusions and Open Issues

Currently...

[diagram of current digital library silos]

Content-centric, static storage, isolated, environment-specific, isolated and repeated efforts

Myths

DL for library only
DL for cultural heritage only

Future Development Methodology

[diagram of shared generic DL stuff]

* Generic DLMS tech to build on

Future DL

* Person-centric
* Targeted fr active communication/collaboration
* Global distributed interacting systems
* Generic DLMS to build on
* "All" applications (not just libraries and cultural heritage)

Develop from top-down, from the user interface / user requirements down
Interop between DLs

Conclusions

* Still early in the game
* These systems can serve as first versions / early protoypes - they are not ready for industrial production

Open Issues

* Grand unified theory of SOA, P2P and Grid
* Grand unification of BRICKS, Delos, DILIGENT...
* Standards
* Tons of research issues
  - distributed search and workflow optimization
  - distributed information fusion
  - intelligent caching of information and processing state
  - intelligent placement and replication of information and processing
  - ...

* two more projects
  - BELIEF - how to exploit "knowledge infrastructure"
  - DRIVER - distributed access to international repositories

Previously:
January 15, 2005  DELOS - digital library architecture

Comments

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

----

Search


  • Google
    Web scilib.typepad.com

Receive via Email



  • Powered by FeedBlitz

Twitter Updates

    follow me on Twitter

    Furl Linkblog

    Resources

    Recent Comments

    Referral

    StatCounter

    Googlytics

    Technorati

    Blog powered by TypePad
    Member since 11/2004