Sunday September 17, 2006
09:30
Tutorial 4
Distributed Infrastructures for Digital Libraries
I liked this workshop a lot, I hadn't really thought about DLs in relationship to P2P and Grid, and I liked the idea that it may be possible to use aspects of all of them.
[raw presentation notes]
Start
Digital library mediates between community and content.
* Core functionality of a DL is well-understood
* Standards have been estabished
* DL management systems are in operation e.g. DSpace, OpenDLib
Evolution of DL
* wider clientele including scientific collaboration
* competing technologies including web search engines
* technology change: grid, p2p, SOA, Semantic Web
* new types of content including blogs, dynamic content, scientific data
Scenario 1 - identifying archeological [] discoveries made by the public
* traditional way is very complex and time-consuming for the archeologist []
Scenario 2 - Environmental Incidents
* can you set up a virtual DL on demand, including all needed data and simulations?
Next Gen DL (NGDL)
* dynamic configurable federation
The future of DLs - services
Specialized services
* Search
- Different media types
- Content-based
- Multi-object, multi-feature
- Multilingual access
- Relevance feedback
* Indexing
* Annotation
* Metadata management
* Content management
* Resource management
Requirements:
Virtual DL
* Easy to extend
* Example: collaboration in eScience applications
Management of services which are:
* Distributed
* Heterogenous
* [?]
Composition of services
* Defining complex services / processes / workflows
* Flexibility
* Example: complex processes for automated storage and replication of data, generation of meta data (content features)
More
* Personalization
* Visualization
* Access on mobile devices
* Context- and location-aware services
* AuthN an AuthZ
* "High availability" [quotes mine] - Access anytime - replication
* Reliability
* Scalability
* etc-ability [my comment]
* Dynamic / Continuously generated data
- example: data generated by certain instruments in eScience
Q (me): How / where to work on standarding service interfaces, so that we can
incorporate them into workflows?
A: will be covered in Web Services presentation
Underlying Technologies and their Promises
Thomas Risse ?
Service-Oriented Architectures (SOA)
Web Service Model
Service Provider publishes service description to Service Broker
Service Requestor requests service list from Service Broker
Service Requester then binds to Service Provider
Elements of SOA
[diagram I don't agree with]
Web Services Stack
[diagram]
WS-BPEL "already widely used, lots of applications"
* Above BPEL - Coordination (WS-Coordination, WS-AtomicTransaction, WS-Notification...) [not mature]
* Security
* Management (WSDM)
* Contracting (trading partner agreement - paper contract)
Challenge: Semantic standards are still in development
Grid Computing: An Application of SOA
Software: Globus Toolkit...
[Globus architecture diagram]
which leads to... Open Grid Service Architecture (OGSA)
Idea: Service orientation to virtualize resources
* Extended Web Services -> Grid Services - Web Service Resource Framework (WSRF)
[OGSA Architecture diagram]
WSRF
* Unified way to model and interact with stateful Web Services
Peer-to-Peer Computing
Summary
== ==
[Overview of Motivation and the three projects]
Web Services and Distributed DL Infrastructures
Challenges
* granularity of services
* Semantics of services
BRICKS, Diligent and DELOS
BRICKS
* transparent access to distributed available information sources
* retrieval of info with knowledge support
* multi-lingual
* easy
* platform independent
BRICKS Approach
* SOA
* Decentralized
* Open Source
BRICKS Node (BNode)
P2P network of BNodes
Budget: 12.2 million Euros
project nearing completion ?
http://www.brickscommunity.org/
The Diligent Project
Diligent develops a DL test-bed infrastructure that allows virtual organizations to create
on-demand DL including computing, storage, multi-content support, app resources.
use a grid computing infrastructure for resource allocation
Key Concepts
* SOA
* Integrating DL services on infrastructure from Enabling Grids for E-sciencE project (EGEE)
* Enhances existing grid services with complex service interactions required to build, operate and
maintain transient virtual digital libraries
Diligent Architecture
[diagram]
Budget: 9.55 million Euros
http://www.diligentproject.org/
The DELOS Project
Network of Excellence to coordinate the development of next generation digital libraries.
[huge project]
* Reference Model for DLs
* Architectures
Delos DLMS
* Specialized DL functionality from DELOS and non-DELOS partners is made available and integrated by means
of (Web) services
OSIRIS: Integration of Services
Q (me): So there are three frameworks, which one do I use? All three?
A: Err, yes. Still in early stages.
== left ==
There was info on how the systems do content and metadata management, while
I was at another tutorial.
== back ==
BRICKS
* Personalization
User profiles add customization on top of e.g. Content, Presentation, Services, Interaction.
Model the User
* many aspects
- will focus on preferences
[a lot of details about how to capture user interest in e.g. keywords and their relationships]
Pesonalization Approaches
Recommenders: Content-based Recommenders
- "find me things that are related to things I have liked in the past"
Recommenders: Collaborative Filtering
- find objects similar PEOPLE have liked
What BRICK? provides is Personalized Search
e.g. Java = coffee vs. Java = programming language
Architecture
Query -> Query Personalization -> Result Ranking
Query Personalization
* dynamic enhancement of query using preferences from a user profile
Personalization in BRICKS
[diagram of Personalization Manager interacting with many foundational bricks]
User Model
* Term preferencecs (e.g. Pisa)
* Physical collection
* Ontology classses
* Attribute (e.g. author="Plato")
All with vectors of terms.
User Profiling
* Transparent user-action tracking and profile update
Currently support
* query personalization
* results ranking
Profiles are local and all processing is local.
The profiles are not distributed. (Eventually they will be.)
DELOS Search [and Annotations]
* Image similarity
* Audio retrieval
* Video retrieval
* 3D retrieval
Visualization and Interfaces
Collections, Annotations, Personalization and Search in DILIGENT
* Collection is the ultimate source for search
* Physical and Virtual collections (virtual - constructed by query)
Personalization
* Manage and maintain user profiles
* Query personalization
currently there is only manual assignment of profiles,
eventually there will be user behavior tracking
The DILIGENT Search Engine
* Composed by a distributed set of services communicating via WS-* protocols being
orchestrated under a master component
... [lots of detail]
Query execution plans are ultimately converted to Process Execution Engine (PES) compliant
workflows (BPEL4WS).
Workflow is submitted to PES
[very complex queries are supported]
Lessons Learned
* big complex subject with ongoing investigations'
General Methodology Lessons
* Communication of the concept of distributed architectures to the end user is a hard process
* Early prototyping is helpful
* All three technologies (SOA, P2P, Grid) and their implementations are still evolving
* Developers and system designers have
- a long learning curve
- difficulties in implementing a DL on top of these three technologies
When you work with librarians, they can't dream hard enough = they don't know what is possible.
Technologies have to work hard with librarians to make them understand what can be done.
General Tech Lessons
* The three technologies are not orthogonal
* All three are relevant to future DLs
* No tech can meet the requirements of a DL alone
* But you don't need all technologies always for everything - select as appropriate
Specific Tech Lessons - SOA
* Important
- functionality can be combined across network borders
* A service-oriented design is different from traditional design
- e.g. the number of functions should be limited, due to [various] considerations
- service invocations are NOT like LAN remote procedure calls
* Communication costs between services are often underestimated
Tech Lessons - P2P
* Beneficial when you have a lot of data
- supports redudancy
* Increased complexity in content security, access control, ...
Tech Lessons - Grid
* It gives you high computing power
* Critical for supporting computationally-intensive tasks
* Beneficial when you have a lot of data
- supports redudancy
* Not really targetting interactive or real-time applications but rather batch,
long-running processing tasks
* Fine-grained security complicates planning query execution
* (Provides a) standardized mechanism for on-the-spot processing and exchanging large
amounts of XML data
* Despite its current shortcomings, WSRF offers an elegant substrate for building
a dynamic distributed system based on standards
* Failure to carefully plan complex workflows ... might lead to ... execution failures
* Porting database concepts and workflow to DLs opens opportunities for distributed information retrieval
[presented by Yannis from University of Athens]
Conclusions and Open Issues
Currently...
[diagram of current digital library silos]
Content-centric, static storage, isolated, environment-specific, isolated and repeated efforts
Myths
DL for library only
DL for cultural heritage only
Future Development Methodology
[diagram of shared generic DL stuff]
* Generic DLMS tech to build on
Future DL
* Person-centric
* Targeted fr active communication/collaboration
* Global distributed interacting systems
* Generic DLMS to build on
* "All" applications (not just libraries and cultural heritage)
Develop from top-down, from the user interface / user requirements down
Interop between DLs
Conclusions
* Still early in the game
* These systems can serve as first versions / early protoypes - they are not ready for industrial production
Open Issues
* Grand unified theory of SOA, P2P and Grid
* Grand unification of BRICKS, Delos, DILIGENT...
* Standards
* Tons of research issues
- distributed search and workflow optimization
- distributed information fusion
- intelligent caching of information and processing state
- intelligent placement and replication of information and processing
- ...
* two more projects
- BELIEF - how to exploit "knowledge infrastructure"
- DRIVER - distributed access to international repositories
Previously:
January 15, 2005 DELOS - digital library architecture
Comments