Web sites

What is link rot and how does it threaten the web?

Arcady/Shutterstock.com

If you’ve been browsing the web and encountered a 404 error page or an unexpected redirect, you’ve seen link rot in action. Over time, the ties that bind the web are breaking, threatening our shared cultural history. Here’s a look at why this happens.

What is link rot?

Link rot occurs when website links break over time, resulting in a broken or dead link. By “broken link” we mean a link that no longer points to its intended target from the time the link was first created. When you click on one of these broken links, you either get a 404 error or see the wrong page or website.

Link rot is common. A 2021 Harvard study examined hyperlinks in more than 550,000 New York Times articles from 1996 to 2019 and found that 25% of links to specific pages were inaccessible, with the rate of degradation increasing significantly depending on the age of links (e.g. around 6% of links from 2018 were dead compared to 72% of links from 1998). Another study found that out of a set of 360 links gathered in 1995, only 1.6% were still working in 2016.

Why does link rot happen?

The web is a fluid, decentralized medium with no centralized control, so content may become unavailable at any time without warning. Servers come and go, websites shut down, services migrate to new hosts, software gets updates, publications move to new content management platforms and don’t migrate content, domains expire, etc

There is another related problem on the web called “content drift”, where the link remains functional but the content contained in the link has changed from the original link, which can cause problems because the original author of the link intended to point to different information.

What’s so bad about losing old websites?

It is the nature of the world that things decay and disappear. Keeping information alive is an active process that takes time, energy and effort. So the main problem with link rot is not necessarily that we have to store all information forever, but that electronic information and references have potentially become more fragile and vulnerable than the paper-based ones primarily used in the past.

Many authors of journalistic articles, academic articles, and even court decisions use web links as a citation mechanism to provide vital sources of context to the information presented. This has been a problem with Wikipedia too. As Jonathan Zittrain explained in a 2021 article on link rot for Atlantic, “Supply is the glue that binds the knowledge of humanity together. This is what allows you to learn more about what is only briefly mentioned in an article like this, and for others to double-check the facts as I represent them.

If links break and sources become unavailable, it is much more difficult for a reader to judge whether the author has honestly and faithfully represented the original source of information. And even beyond the links, some websites provide information online that cannot be found anywhere else. The loss of these pages creates gaps in the collective knowledge of humanity and holes in the fabric of our common culture.

What is the solution to link rot?

Experts consider link rot and content drift to be endemic to the web as it is currently designed. That means it’s a part of the fundamental nature of the web that won’t go away unless we actively try to fix it or mitigate it.

One of the most effective solutions to the problem of link rot emerged in 1996 with the Internet Archive, which has maintained a public archive of billions of websites over the past 25 years. If you find a broken link, visit the Internet Archive’s Wayback Machine and paste the link into its search bar. If the site was captured, you will be able to browse the results. Or if the site has recently gone down, it may be possible to display original content from a cached copy stored by Google.

Beyond the Internet Archive, a Harvard-led project called Perma.cc captures permanent versions of websites for the purpose of long-term academic and legal citation. A library consortium maintains the links, so they should stick around for a while. The goal is to create links that don’t rot – they should persist as long as the Perma.cc archive is maintained.

Other potential solutions to link rot are still state of the art, including potential Web 3.0 solutions and distributed data hosting through protocols such as IPFS. Although ironically, hundreds of years from now, it’s possible that the only surviving websites from that era are those that people printed on paper. Stay safe there!

RELATED: How to Print Web Pages Without Ads and Other Clutter