URL to Markdown for Journalists — Archive Sources Safely
Half the URLs in last year's reporting no longer resolve, and the other half have been quietly edited since publication. Convert each source URL to Markdown the moment you cite it and you have a frozen, plain-text, timestamped record — the digital equivalent of clipping the morning paper. Pair with an archive.org snapshot for belt-and-braces source preservation across an investigation.
Why this is hard without the right tool
- Sources get deleted or edited after publication
- Need timestamped proof of original page content
- Archiving for investigations across hundreds of pages
- Building source libraries that don't depend on the live web
Recommended workflow
- As you cite a source, open /convert/url-to-markdown and paste the URL
- Click Convert and download the
.mdfile — store it undersources/YYYY-MM-DD/slug.md - Add a YAML front matter block with the source URL, fetch date, byline, and an archive.org snapshot URL captured separately
- Reference the local Markdown copy from your draft, with the live URL preserved in the citation
- For PDF source documents (court filings, leaked memos, financial reports), run them through /convert/pdf-to-markdown so the whole source library lives in one consistent format
Frequently asked questions
Is a Markdown archive enough for legal source preservation?
For day-to-day reporting, it's a strong evidence trail — plain text with timestamp metadata, the same level of preservation a clipped newspaper provides. For court-admissible chain-of-custody preservation (defamation defence, subpoena response), you want a dedicated service like Page Vault, Hunchly, or a notarised WebCite-style capture. Markdown conversion is content extraction; specialised forensic capture is a separate workflow.
How do I capture a page before a source notices and edits it?
Speed matters more than tooling. The instant you decide a page is reportable, paste the URL into the MDisBetter web tool and download the Markdown. Total time: under 30 seconds. In parallel, submit the URL to web.archive.org/save/ for a public snapshot. Both records together survive the source's deletion or edit.
Can I archive social media posts, including ones that get deleted?
For public posts at the moment you read them, yes — paste the post URL and convert. For platforms that aggressively client-render (X, Threads, Bluesky), JS-rendered fallback handles most cases but not all. For high-stakes capture (a deletable tweet from a public figure), pair with a screenshot tool and an archive.org submission. Markdown gives you the searchable text; the screenshot gives you the visual evidence.
Best practice for organising an investigation's source library?
One folder per investigation, sub-folders by date, one Markdown file per source with consistent YAML front matter (url, fetched, author, role, relevance). Drop the whole folder into Obsidian or a private Git repo. Cross-link related sources with Markdown links. By month three of the investigation, you have a navigable, searchable archive instead of a hundred browser tabs and a Drive folder of screenshots.
Does this work behind paywalls (NYT, FT, Bloomberg) for sources I have access to?
The MDisBetter web tool fetches anonymously and won't see paywalled content. For pages your authenticated session can already access, the right path is the SingleFile or Save Page WE browser extension — it captures the rendered HTML from your logged-in tab, which you can then convert in MDisBetter. Don't redistribute beyond fair-use; the conversion is your reporting aid, not a syndication channel.