I want to tell you about a number that scares me: 404. That infamous code you see when that internet thing you meant to visit is gone or it moved and no one bothered to add a redirect or maybe it never existed at all.
I’m curious though: how many of you have ever made something you’re proud of on the Web?
So many of us have written, recorded, photographed, or created important works in our personal and professional worlds that live online. Maybe they’re your bylines at that fancy publication about tiny houses, or your YouTube seltzer reviews, or your graduate thesis about the history of pizza ovens. It’s not really important what they are, just that they exist and they’re online.
Well, until…they don’t. 404: Page Not found. 410: Gone. 500: Internal Server Error. These numbers, or status codes, tell us what went wrong but not really why. This problem, the problem of the disappearing internet, of “link rot”, is no joke. Researchers have found that over 50% of URLs cited in Supreme Court opinions no longer point to the intended content. Roughly 70% of links in academic legal journals are broken, and 20% of all science, technology and medicine articles suffer from link rot. The average life of a webpage hovers right around 100 days.
People often patly state that “the internet never forgets,” that once something is online, it will be forever. In a certain light that’s true. It’s nearly impossible to permanently remove something from the internet, on purpose. But, by the same token, the web also disappears at an alarming rate. 5% of the entire internet is lost every year, and we barely notice.
Making something on the web is not a one-time investment. Someone has to spend money every year on the domain, hosting, and maintenance. But what happens when the financial incentives to do that change? Right now the massive data centers that house all this information use 3% of all the electricity in the United States. What happens when that power gets too expensive? Or when we’ve been online for centuries and we start deleting dead people’s pages? Unlike a film, or a play, or a book, the costs of keeping art and science on the web are never-ending. We’re building one of our most important shared cultural resources on land that we rent rather than own, on borrowed time from a parking meter that’s all but guaranteed to run out.
We even saw a large-scale example of this recently when a capricious billionaire hastily took down years of content from Gothamist and DNAInfo, leaving reporters to scramble for saved and aggregated clippings of their work just to build a portfolio to get an new job.
Before you say, “Wait Matt, there’s this One Weird Trick. What about the Wayback Machine, what about the Internet Archive, what about Google’s cache?” Let me quote the web developer Maciej Cegłowski in his talk Web Design - The First 100 Years:
We have heroic efforts like the Internet Archive to preserve stuff, but that's like burning down houses and then cheering on the fire department when it comes to save what's left inside. It's no way to run a culture. We take better care of scrap paper than we do of the early internet, because at least we look at scrap paper before we throw it away.
He’s right. It is no way to run a culture. We’re experiencing quantitative losses of data on par with the burning of Alexandria every year, and we’re barely blinking an eye as the stuff we’re making vanishes in a puff of smoke.
The truth is: there is no easy fix. But as writers and makers and inhabitants of the internet, we need to demand better of the platforms and services and publications we entrust with our work. It might seem safer to trust the big guys (Facebook, Twitter, Medium) with this content because they have the funding and incentives to maintain it. That’s true today, but large platforms like them have failed before, taking terabytes of data with them. Remember Friendster, TwitPic, Geocities?
There are academic efforts like Perma.cc out of the Harvard Library Innovation Lab that will solve this problem for the most important legal and scholarly works. But we can and must to do better than that.
Starting in 2014, a small group of programmers became obsessed with building what is called “content addressable” version of the internet called IPFS. IPFS stands for “InterPlanetary File System”. And “content addressable” means that files are stored and located by their content instead of an arbitrary and therefore brittle address. As I’m sure some of you have guessed by now, it’s built on top the blockchain. Insert eye roll emoji 🙄. But before you write them off, I think these nerds might be on to something. Their system, which is entirely peer to peer, and inherently resistant to the rot I’m talking about is already being used to build a mirrored version of Wikipedia that will be accessible from countries with oppressive regimes, and was used by those in Catalan seeking independence when the government blocked their pages from being accessible on the web. The IPFS team is building a system by which the websites and apps of tomorrow might be able to defend against this failing foundation, but who knows if it’ll get adopted.
The next time you make something and put it online: think about where it’s going to live, how long it’ll be around, and what you can do to preserve it, even if that means making an extra local backup, or printing it out on a dead tree. The culture we’re building together is increasingly digital, hyperlinked, and accessible from anywhere. But it’s not accessible from any when. We’re losing more and more of it every day. If we’re going to continue making things online, we need to deal with this problem systematically and soon. How? I’m not sure. Maybe IPFS, or something like it that hasn’t been invented yet. Until then, I’ll keep my printer.