I want to look back on how we almost built an ecosystem of information for human and machine readers, and then it fell apart.
Below, I will see if I can tell the tale of the decline of the blogosphere and end up with thoughts about the Antikythera Mechanism and scholarly communication.
In 2007, Darren Barefoot wrote in The Tyee about an era that just 9 years later, is totally gone.
I subscribe to the RSS feeds for about 175 blogs.
Later down, it says
Technology reporter and uber-geek Tod Maffin runs Inside the CBC. It's kind of an industry blog, in that it covers the world of Canadian public broadcasting...
The Rise and Fall of Inside the CBC
Let's have a look at Inside the CBC now, in 2016
It is definitely not by Tod Maffin, nor does it cover the world of Canadian public broadcasting. It's more or less what we would call this year fake news. More specifically, it's a kind of human content written for robots, but not at all in the way I intended. It's most likely, given the jumble of topics (neopets, skates, weddings in Gatlinburg), a search-keyword-driven tasking of quick content creation. "Neopets searches are peaking, quick, write something about neopets!"
Which is to say, the ecosystem of interlinking conversations that Barefoot describes in 2007 is quite gone. It's clear there's some complicated history behind its demise, but for simplicity's sake, here's what we can find from the Internet Archive: by February 1, 2011 the blog has a posting from January 10, 2011 at 8:58 pm. And that blog post remains, frozen in time at the top of the page until at least January 28, 2013. By June 21, 2014 the page reads simply "The domain insidethecbc.com is no longer parked by GoDaddy." By December 21, 2014 it has become, rather unexpectedly, a blog in German: "Wir haben ein großes Forschungszentrum mit über 100 Mitarbeitern und Niederlassungen in Übersee und Asien." / "We have a large research center with over 100 employees and branches in overseas and Asia." By September 9, 2016 it has transformed again, writing in (sort-of) English about "The advantage of getting nucific bio x4 coupon codes" and by October 22, 2016 it has settled into its current format.
The long and short of this is firstly, this is a disaster for a "many small pieces, loosely joined" ecosystem, and secondly, without Internet Archive, or even if the current owner of Inside the CBC changed their robots.txt, all of the original site would be gone.
The Other Shoe Drops
The story of Inside the CBC turned out to be more complicated than I thought. Is it illustrative of the decline of the blogosphere? Well just seven years after his article in The Tyee, Barefoot is blogging "In 2014, what is my blog for?"
What has happened? Well, basically, it turned out that this world of interlinking blogs and feed reading at best fell apart, at worst was deliberately dismantled.
I realise the latter is a more compelling story than the former, but it's a combination of factors. In the In Our Time episode The Library of Alexandria, we may look to hear again the story of how the library burned, but the conclusion is actually that it really just faded away. With the rise of the Christian Roman Empire, the old knowledge and the old conversations, the dialogues between the books, just weren't of as much interest any more.
But one of the reasons we still have this discussion about the destruction of the library is because of the reality of so much loss of information.
In Reality Is Not What It Seems (BBC Radio 4 adaptation episode 1, Fiat Lux), Carlo Rovelli tells the history of physics in the conventional way that most western European scientists do, beginning with the Greeks. In particular, he speaks of Democritus, and of his dismay at the loss of Democritus' original writings. "We know of his thought only through the quotations and references made by other ancient authors, and by their summaries of his ideas. I often think that the loss of the works of Democritus in their entirety is the greatest intellectual tragedy to ensue from the collapse of Classical civilization."
This formulation of the history of physics is so common that The Big Bang Theory parodies it in episode 3x10, where in order to teach Penny about Leonard's current research, Sheldon will only present the topic by starting with "It is there in Ancient Greece that our story begins..."
The episode is entitled The Gorilla Experiment, it aired in 2009. It's not to be confused with 7x23 The Gorilla Dissolution which aired in 2014.
Consider that while Rovelli is lamenting the loss of writings from 2400 years ago, we are not even doing a good job of maintaining writings from 9 years ago. In fact, some Internet content is only now available through quotations and references made by other bloggers.
(And its worth noting that when you read this, you probably won't be able to listen to Fiat Lux, because the BBC is only making it available online for another 25 days. And depending on your local copyright laws, you may not be able to legally view a clip from The Big Bang Theory without having purchased access to the episode.)
Thanks Google and Facebook
It didn't help that Google closed Google Reader. It's a minor miracle that Google FeedBurner still exists. The demise of (Facebook-owned) FriendFeed removed a conversation option from the web. Overall, the ecosystem has shifted to closed commercial services and to search results driving traffic to commercial sites.
If you want an idea of how fragile this ecosystem is, turn to the Elections Quebec page on Electronic Voting. It used to include four press releases. These press releases have now been "archived", which in this particular case means removed entirely from the web. This is what the page looked like September 16, 2016 with the press releases (page from the Internet Archive)
and this is what it looks like now, with no indication the press releases ever were there.
There is nothing insidious about this, this is just standard web procedure - the press releases are from 2006, probably they were rarely accessed, so you run your ROT (Redundant, Outdated, Trivial) analysis and conclude that ten years online was enough, it's time for the press releases to go.
But once they go, they're gone. Are they in the Library and Archives Canada web archive? Are they in the BAnQ web archive? It's hard to know. Neither provides a public search interface that can uncover the Elections Quebec pages. Without coverage in the national archive, and in the provincial archive, and in the Internet Archive, content that is removed from the web is just simply gone forever.
So this is where we find ourselves. We have a single main service, the Internet Archive, that depends on private funding and that is attempting to make a backup of itself in Canada. Coverage from other web archives is unclear and may be nonexistent. Every day, as websites are reorganised, or content is deliberately removed for various reasons, or websites are simply neglected until the domain expires, our online Library of Alexandria fades. Not in some blaze of destruction, but more from lost interest, the same as the original.
You might think this is not a problem, because we can get the press releases from the Internet Archive, but it never indexed them. So for all intents and purposes they are gone now.
Because I happened to be paying particular attention to Canadian electronic voting information over the past few months, I was able to recover three of the press releases using a combination of Pinboard, Google and Bing caches. The search engine caches would have been replaced very quickly, so it's only good fortune that I was able to grab the content before it was lost.
I have also become a bit obsessed about adding manually adding pages to the Archive, which you can do by going to https://web.archive.org/ and pasting the URL into the box under Save Page Now and clicking Save Page.
The Loss of the Immune System
Blogging, with its interlinked web of discussions, was a kind of web immune system. It wasn't perfect, but it was a way to have a conversation, and a way to provide signals to Google about what was and wasn't important, and to some extent, about what was and wasn't true.
It used to be possible to blog about a topic and get discovered through the network of blogs, to get added to feed readers, to become a new information source.
It used to be possible to blog about a topic and get good Google search ranking, to be discovered through search and thus be an important contributor to the conversation.
This is all basically gone. Many of the blogs are gone, the blog discovery ecosystem is gone, the feed readers are gone, and Google search rank is very hard to get.
The Post- Era
Here's a quote
For some, this past election year was about the slow death of the current political system.
Can you place it in time? It's from... 1997. Jon Katz wrote enthusiastically in Wired about how online conversations were going to transform political discourse.
On the Net last year, I saw the rebirth of love for liberty in media. I saw a culture crowded with intelligent, educated, politically passionate people who – in jarring contrast to the offline world – line up to express their civic opinions, participate in debates, even fight for their political beliefs.
I watched people learn new ways to communicate politically. I watched information travel great distances, then return home bearing imprints of engaged and committed people from all over the world. I saw positions soften and change when people were suddenly able to talk directly to one another, rather than through journalists, politicians, or ideological mercenaries.
I saw the primordial stirrings of a new kind of nation – the Digital Nation – and the formation of a new postpolitical philosophy.
Jon Katz, in 2016, now writes books about dogs.
So how did the hopes for online conversation and engagement go to the dogs? How did we get from a postpolitical philosophy to post-truth as the word of the year?
Amusing Ourselves to Death in 2016
2007 may have been the tipping point. The iPhone was announced in January of 2007.
Sitting in front of a desktop, you have a keyboard and a mouse and a screen. This drives a certain kind of text-based, highlight-and-insert content creation.
Smartphones and tablets, on the other hand, are terrible at long text and inserting content. They are great for creating photos and videos. And so now that's the world we have, the photo and video world.
All of the signals that we needed to rank and sort and link and discover are gone. Now it's just bam! image, bam! video. No context, no links, just endless streams of content. The web now is, in short, television rather than a library. And the consequences of this are huge.
In MIT Technology Review, Hossein Derakhshan writes
Before I went to prison [in 2008], I blogged frequently on what I now call the open Web: it was decentralized, text-centered, and abundant with hyperlinks to source material and rich background. It nurtured varying opinions. It was related to the world of books.
Then for six years I got disconnected; when I left prison and came back online, I was confronted by a brave new world. Facebook and Twitter had replaced blogging and had made the Internet like TV: centralized and image-centered, with content embedded in pictures, without links.
The problem is not that television presents us with entertaining subject matter but that all subject matter is presented as entertaining.” (Emphasis added.) And, Postman argued, when news is constructed as a form of entertainment, it inevitably loses its function for a healthy democracy.
In other words, we used to be able to use the web to have a conversation, and now we are basically using the web to amuse ourselves to death.
The blog ecosystem helped to create a kind of web immune system, an immune system that Google could use to surface healthy information. With that gone, it's no wonder that false news can spread easily.
Finding the Antikythera Mechanism
Science depends on a web of citations, a web of knowledge. Without a web of knowledge on the actual web, how can we hope to make discoveries and determine what is of interest. How can we challenge information when we're in our filter bubbles, Facebooking to one another, off of the public web?
In 2008 (remember my talk from 2008? it's way back at the start of this blog post), I thought we might be able to meld human and machine understandings in order to advance the conversation. I promoted the idea of better formatting information for machine processing, in order that we could benefit from machine-aided search and discovery.
I deliberately chose not to emphasize automatically-generated information. Depending on the era, I've heard that the Semantic Web would solve discovery, or that OAI-ORE was going to link everything together, or now that Artificial Intelligence and Big Data will discover all connections automatically.
Would that this were so.
In Searching for Lost Knowledge in the Age of Intelligent Machines, Adrienne LaFrance writes
What if other objects like the Antikythera Mechanism have already been discovered and forgotten? There may well be documented evidence of such finds somewhere in the world, in the vast archives of human research, scholarly and otherwise, but simply no way to search for them. Until now.
This is a compelling vision of knowledge that we don't even know that we have, that could be unearthed if we just digitized and translated everything, and then sent our AIs digging for connections. Maybe Democritus is out there, in some copy of a copy of an Arabic translation. Maybe this whole lost world of complex mechanical devices is out there on paper somewhere, just waiting to be found.
But there is a pretty harsh collision between that vision and the reality of the web we've created.
We were on a path that might have enabled this scholarly discovery, although probably with a lot more human intervention than techno-utopians would like. But now we can't even find things from a few years ago. We were assembling the book-to-book conversations of a new Library of Alexandria, and now we're scraping down the pages like in the Archimedes Palimpsest
This medieval Byzantine manuscript then traveled to Jerusalem, likely sometime after the Crusader sack of Constantinople in 1204. There, in 1229, the original Archimedes codex was unbound, scraped and washed, along with at least six other parchment manuscripts, including one with works of Hypereides. The parchment leaves were folded in half and reused for a Christian liturgical text of 177 pages; the older leaves folded so that each became two leaves of the liturgical book.
But we're doing far far worse than writing over thousand-year-old knowledge. We're erasing completely information from last year, with no xray to recover it.
Which is to say, as much as I love the vision of recovering the lost history of the Antikythera Mechanism, I'm more worried we won't even have a history of last year.
Remaking the Web We Made
This is the web now
The Web of Desperate Popup Intrusion.
That used to be the web of users deciding to subscribe to feeds, instead of being badgered to sign up to email newsletters, or to view ads, or to pay up immediately.
So let's try to unwind some of our mistakes. Rather than some AI utopia that will unearth lost connections from millenia past, let's deliberately build a human reality of intentional reconnection. Some suggestions:
- more slow, less fast
- more longform, less shortform
- more blogging, less Twitter
- open your feedreader, not your email
- less reaction to images, more thought about text
- get more sleep
- read more books
- put the smartphone down
- more services that let us pay for them, less ad-supported data-mining intrusion
- more public broadcaster support
- systematic public investment in web archiving
- less closed conversation, more open public conversation
- a more realistic blend of human-created metadata and machine analysis
I will readily admit that I got pulled away into the quick rewards of Twitter from the slow work of blogging. And it's become a self-reinforcing system. As blogging and feedreading faded, blog hits and links and comments faded. Blog rank in Google faded. Wearing another hat, I wrote 595 blog posts about online voting from 2004 to 2016. All that work, and (for the particular Google results I see today on this computer), the blog shows up once on page 5 of results for "online voting" canada. It's hard to maintain one's enthusiasm for 5th page ranking. It's hard to maintain one's blogging for two web hits a day.
But then I remember I originally started blogging just for myself. I didn't anticipate my blog would grow and connect in the way that it did for a while. So I am going to try to get back to blogging, because everywhere I go now, even for the few seconds of an elevator ride, I see people lost in their smartphone screens. I don't see how we can continue on losing ourselves in those little personalized screens.