Relevance and the Future of Search
Stephen E. Arnold
http://www.arnoldit.com/
[sidebar
books:
Google Legacy
Enterprise Search Report
InformationWeek Author: Google's Patents Reveal Strategy To Beat Microsoft
]
Relevance: What? For Whom? When? and How
Has examined Google patents.
"patent fence around: relevance, individualizing search results, algorithms to calculate editorial value of a page"
Google: more patents in 1st 6 months of 2005 than in previous history.
Roughly a 2 year retrospective.
Google: 62% of weblog referrals. Yahoo: 35%. MSN: 13%.
Previously: Google 51%. So Google is capturing a big share of search.
Cover story in Online: Relevance and the End of Objective Hits
Q: Is Google more than search?
- vast majority of audience said more than search
What makes it different?
1 Googleplex
commodity hardware
search shows how clever their HARDWARE ENGINEERING and SOFTWARE ENGINEERING is
[diagram of Googleplex - Google architecture - the Earth is surrounded by Google's distributed, parallel, supercomputer, with continuously expanding storage / infrastructure, 165,000 servers in 16-32 datacentres]
performance: 15-40x faster than IBM and HP and Dell at 1/7th cost
for every dollar Google spends Microsoft and Yahoo must spend 7 or more
Google just plugs in commodity components.
Relevance: Now a Fuzzy Black Box
[fire alarm]
[diagram showing various relevance needs: need for users...]
[we just had to evacuate due to fire alarm
photos will be forthcoming
we're back now]
Relevance under seige:
- small overlap between Ask Jeeves, Google and Yahoo
- 90% of search users cannot differentiate between search results and ads
When is a "hit" relevant?
Individualized Google
- Google is interested in providing INFORMATION THAT SOLVES A PROBLEM - this could be an ad
Patents on Individualizing Results
- list of many examples of personalization: News, Toolbar, Desktop Sidebar, Desktop Search, Local... more than 51 services to which personalization can be applied in some manner
"Google is a mathematics-based entity" - users provided click data, they do analysis, result: actionable information - what users do is watched and analyzed
"significant consequences for competitive technical intelligence"
How does this math (these metrics) affect relevance?
Section 3: Search-Engine Optimization (SEO)
Relevance is both about what is on your site, but also about how often Google indexes you.
Ranking is important in Google.
Example: Aviadian - made changes to their site (based on SEO recommendations) and got in the "google sandbox" i.e. they were blacklisted
Five Musts
* In-bound links from high-traffic sites
* Fresh, seminatically "tight" content
* Site map that points to what you want indexed
* Well-formed pages
* Appropriate metatags
he urges interlinking: point to relevant content, have relevant sites point to you
A site that has highly diverse content will rank lower, they want "tightly" focused content,
including the URLs, metatags, ...
"What about MS and Yahoo? For all intents and purposes, relevance is defined by Google's guidelines"
Have accurate, well-coded pages.
Session ids cripple Google's technology.
Five Cheats
* Steal text from high-ranking sites. Risk: duplicate detection will blackball you.
* Metatag spamming. Risk: low ranking.
* Blog seeding. Risk: spam detection algorithm can sandbox your site.
* List yourslef in link farms. Risk: link analysis algorithm will sandbox your site.
* Doorway cheat. Expose one page to Googlebot and another to a human. Risk: remove site from index.
Google calculates semantic similarities between blog postings and if they detect duplicates, they will downgrade you.
"[in terms of technology] Google has a Ferrari, everyone else has a Mustang"
What's the boundary between SEO and "real indexing"?
Section 4: GUIs
How do we address this? The answer is the interface.
eToys - clickable links that narrow search e.g. to particular age group
Endecca??
Hybrid Search: Facets, Hard Coding, Synonym Expansion - example of pre-selected search results
Who defines relevance in a GUI?
Summary
* Many ways to deliver relevance
* Understand context of content
* Asses the basics
- Provenance
- Accuracy
- Currency
- Selective depth
* SEO is a legitimate "indexing model"
* More work needed on situational relevance
My summary: I found it a bit difficult to complete grasp the concerns he was expressing. He is definitely very concerned about Google's size and advantage in computational power. I initially thought he was recommending that you use SEOs in order to get good rank, but if I understood his reponse to a question, he was actually saying that it was very worrying for him that it is Google + SEOs who determine how your site ranks, and that depending on them, important information may get a low ranking.
I have to say for myself, I think SEOs are total voodoo and I would never use one. The only thing I have ever done is written detailed technical content and I frequently show up not only on the first page of the Google results but also quite high up on the page as well.
I think for now, I would recommend just producing good content, updating it frequently, make sure that you get well-linked by other sites, and make sure that the X/HTML for your pages is clean i.e. validates without errors.
Comments