11:00 Keynote 2
Queries and Clicks as a Source of Knowledge
Ricardo Baeza-Yates
Yahoo Research
Barcelona, Spain & Santiago, Chile
http://www.dcc.uchile.cl/~rbaeza/
web (chaotic) - DL (ordered)
[overview of Yahoo sites, amount of info, types of data]
"Information Games" - win if you match tags
http://www.espgames.org/
Observed Data
* Query Logs
* Result/Web Clicks
* Advertising clicks
* Social
Talks about "safe" (trusted) vs "dangerous" (false, spammy) information/sources.
The Wisdom of Crowds
The Power of Social Media
Motivations for Web Mining
* The Dream of the Semantic Web
- Obstacle: Us
* User Actions: Implicit Semantic Information
- free
- large volume
- unbiased
- can we capture it?
- hypothesis: Queries are the best source
Mining Queries for...
* Improved Web Search
* User Driven Design
- Information Scent
- web site that users want
- web site hat you should have
- imprve content & structure
* Bootstrap of pseudo-semantic resources
Web Queries
* short queries & impatient interaction
* smaller and different vocab
* different user goals (Broder, 2000)
- information need
- navigational need
- transactional need
* Refined by Rose & Levinson, WWW 2004
http://citeseer.ist.psu.edu/rose04understanding.html
Yahoo Mindset
http://mindset.research.yahoo.com/
Relevance of the Context
* moving to less information, more context
Context
* Who you are
* Where you are
* What you are doing
* Issues: privacy ...
* Sources: Web, CV, usage logs...
* Goals: personalization, localization, better ranking in general...
Context in Web Queries
* IP, time, location (based on IP), interaction history, task, OS, browser...
User Intention
* Kang & Kim, SIGIR 2003
- their method was not effective: 60%
* Liu, Lee & Cho WWW 2005
- prediction power 90%
Yahoo
* Manual classification of more than 6000 popular queries
- query intention and topic
- classification and clusering
- machine learning
* Baeza-Yates ? 2006?
Results: You can do (machine learning?) classification of intention on information queries
Next step: Clustering Queries
* Define relations among querys
- common words
- common clicked URLs: works better = natural clusters
* define distance functon among queries
Yahoo Approach
* Can we cluster queries well?
* Can we assign user goals to queries?
[details of method]
The user queries represent in a way the user view of your data/system.
Uses
* Improved ranking
* Word classification
- e.g. synonyms in the same cluster
* Query recommendation (ranking of suggested queries)
Building Taxonomies
* Infer topics for queries that imply documents
Result: Automatic classification is better than (single or a small group of) humans! (but/because the auto classification is based on the actions of many many people)
Final Remarks
* Many potential uses of the wisdom of people
Q: Compare user pseudo-taxonomies to taxonomies autogenerated from text
A: probably they are quite different - people use different words (or a different majority set of words) in queries than they do in written text
Q (Rachel Heery UKOLN): can this pick up new terms (new words) being used?
A: working on this
Comments