Link Rot Is Killing Your Research — Save Web Content as Markdown
You bookmarked a great article three years ago. Today you click the bookmark and get a 404 — or worse, a parked domain serving slot-machine ads. The article is gone. The site reorganized, the publication folded, the author switched platforms, the URL slug changed in a CMS migration. Your bookmark is a tombstone. This is link rot, and it has been the silent destroyer of personal research libraries for two decades. The fix isn't a better bookmark manager. It's saving the content itself.
What is link rot
Link rot is the gradual, predictable decay of URLs over time. Pages disappear; URLs change; entire sites go dark. The problem is well-studied: research over the past decade consistently finds that roughly a quarter of web links die within five years, and over half within a decade. The exact percentages vary by source domain — academic citations decay faster than government pages, news articles decay faster than reference works — but the trajectory is universal. The web you bookmarked is not the web that will be there when you go back.
The mechanisms are mundane:
- The publication shut down or got acquired and migrated content (poorly).
- The CMS changed and the URL slug structure changed with it. Old URLs return 404.
- The author left and the site quietly removed their work.
- The page got behind a paywall that didn't exist when you bookmarked it.
- The site exists but the article was "archived", which usually means "deleted".
- The content was edited — sometimes substantially — and the version you remember no longer exists.
None of these are exotic. They happen to bookmarks every week.
Why bookmarks fail
A bookmark stores a URL. It does not store the content. When the URL stops resolving, the bookmark contains nothing useful — just a string that points at a 404. Browser bookmarks, Pocket, Raindrop, Notion web clipper (in URL-only mode), Pinboard's bare links: all of them have the same fundamental flaw. They optimize for organization and tagging, not preservation.
Read-later apps with full-text capture (Pocket Premium, Instapaper, Readwise Reader) do better — they store a snapshot. But the snapshot lives inside the app. If the app pivots, raises prices, or shuts down, you lose the archive too. Several major read-later services have done exactly this in the past decade.
Internet Archive's Wayback Machine is a wonderful resource for retrieving lost pages, but it's not personal: you can't tag, search, or organize against your own use cases, and not every page is captured at the version you cared about.
Convert and save as Markdown
The robust pattern is to save the actual content of every article you want to keep — in a format you control, in storage you control, in a form designed to last. Markdown is that format.
Why Markdown specifically:
- Plain text. Will be readable in 30 years on any machine that can open a text file. No proprietary format risk.
- Structure preserved. Headings, lists, links, quotes survive intact — unlike a copy-pasted text dump.
- Compact. A 5,000-word article is 30 KB of Markdown. A million such articles fit on a thumb drive.
- Searchable. Any text editor, grep, or note app can search across thousands of files instantly.
- Tool-portable. Obsidian, Logseq, VS Code, Notion, plain folders — every modern note tool reads Markdown. You're not locked in.
Use URL to Markdown: paste the URL, download a clean .md file with the article body, headings, and links preserved. Save the file in a folder you back up. The article is now yours, regardless of what happens to the source URL.
Build a personal archive
A practical setup that scales from "a few articles a month" to "thousands a year":
1. Choose a folder
One root folder for all archived articles. Local disk synced via iCloud, Dropbox, OneDrive, or Git — pick one and stick with it. Avoid app-specific stores; a folder of files is the most portable option.
2. Use a naming convention
One that survives sorting and search. A common pattern:
YYYY-MM-DD-author-shortname-title-slug.mdExample: 2026-04-15-pchaigne-tokenomics-deep-dive.md. The date prefix sorts chronologically; the slug is human-scannable.
3. Add a metadata header
At the top of each file, a short YAML or comment block:
---
source_url: https://example.com/article
author: Author Name
published: 2026-04-15
fetched: 2026-05-10
tags: [tokenomics, defi, research]
---This makes the file self-describing. Years later, you'll know where it came from, when you saved it, and why.
4. Tag for retrieval
Folders OR tags, not both. Folders by topic if you have stable categories; tags if your interests cross-cut. Don't try to maintain both — the maintenance cost outpaces the benefit.
5. Back up
The whole point is durability. Three locations: local disk, cloud sync, and an offsite copy (an external drive, a second cloud, a Git repo). The 3-2-1 backup rule applies as much to your personal archive as to anything else.
Make it part of the routine
The discipline that makes this work: convert immediately. The cost of doing it now is thirty seconds. The cost of doing it in a year is the article being gone.
A workable habit:
- Reading anything you'd want to find again later? Open the converter in a new tab.
- Paste the URL, download the file, drop it in your archive folder.
- Continue reading.
After a month this is muscle memory. After a year you have a few hundred articles you control. After five years you have a research library that is comprehensively immune to link rot.
Bonus: feed your archive to AI
An archive of Markdown files is, conveniently, also the ideal input format for any LLM. You can:
- Drop the folder into a Claude Project for an assistant that knows your reading.
- Index the files with embeddings for semantic search across your archive.
- Build a personal RAG over your own curated knowledge — see RAG pipeline guide.
This is one of the most underrated payoffs of saving as Markdown specifically. Your archive becomes both a personal library and an AI knowledge base, with no extra work.
What about PDFs you've collected?
The same logic applies to PDFs you want to preserve and query — convert them to Markdown for long-term archival readability and easy AI ingestion. See PDF to Markdown and PDF to Markdown for academic research for the document workflow.
The cost of doing nothing
Link rot is one of those problems that feels low priority until the day you need an article and find it gone. By that point, the damage is done — you can't retroactively save what's already disappeared. The only intervention that works is preventive: capture content as you encounter it, in a format that lasts. Markdown is that format. The conversion takes thirty seconds. The archive lasts decades.
What kinds of content are most at risk
Not all web content rots at the same rate. The categories most likely to vanish:
- Personal blogs. Hosted on platforms that fold; written by people who eventually move on; rarely backed up. The decay rate here is brutal.
- Medium and other platform posts. When the platform changes its model, paywalls posts, or the writer migrates, the original URL often returns nothing useful.
- Technology vendor blog posts. Acquisitions, rebrandings, and CMS migrations routinely orphan years of posts. When a vendor changes its product line, the case studies for the old product often disappear quietly.
- News articles older than 5 years. Even major publishers reorganize their archives. Specific URLs from a decade ago resolve at maybe 60-70% rates.
- Conference talks and slides. Conference websites die when the conference series ends. Speakers' personal sites die when the speakers stop maintaining them.
- Open-source project documentation. When a project's main repo moves or its docs site goes through a rewrite, old links break en masse.
Anything you ever want to cite, quote, build on, or look at again falls in one of these categories. The chance that a randomly-chosen URL from your bookmarks five years ago still resolves to the content you remember is somewhere between coin-flip and worse.
The discipline of selective archival
You don't need to save everything you read. The discipline that scales: archive the things you can imagine wanting to revisit. A reasonable filter:
- Did this article change my mind on something?
- Will I want to cite or quote this in something I'll write later?
- Is this reference material I might need again (a tutorial, a deep technical explanation, a definitive overview)?
- Does this contain data or analysis I might want to verify or build on?
If yes to any, archive immediately. The 30-second cost vs the years-later regret cost is wildly asymmetric. If no to all, skip it — you don't need a maximalist archive.
Tools that pair well with a Markdown archive
The archive is the foundation; what you do with it is the value. A few combinations that work especially well:
Obsidian. Open the archive folder as an Obsidian vault. Backlinks, graph view, and full-text search across everything you've saved, all on local files you own. The combination is one of the most powerful personal-knowledge setups available.
VS Code with Markdown All in One. If you already live in VS Code, your archive is browsable, searchable, and editable in your existing editor. The repo workflow (Git for versioning) gives you backup automatically.
A static site generator. Some people publish their archive as a personal "things I've read" site. Eleventy, Hugo, or Astro consume Markdown and produce a searchable web archive in minutes.
Claude Projects or ChatGPT Custom GPTs. Drop the archive folder into a knowledge base for an AI assistant that knows everything you've read. "What did that article about post-incident reviews recommend for blameless culture?" — answered from the file you saved 18 months ago.
The archival mindset
Once you start archiving, the relationship with the web shifts. Articles stop being ephemeral things you read once. They become persistent assets in your library. You build on them. You revisit them. You find connections between things you read months apart. The web becomes the source of your library, not its replacement.
This isn't a marginal upgrade. For people whose work depends on what they read, it's one of the most under-implemented productivity changes available. The tools have existed for years. The only thing that was missing was a converter that produces clean Markdown reliably enough to make the archival step feel free. That's now solved.