Pricing Dashboard Sign up
Recent
· 7 min read · MDisBetter

How to Save a Webpage as Plain Text (5 Methods)

Saving a webpage as plain text sounds like a one-step operation but isn't. Different methods give very different results — some preserve structure, some don't; some handle JavaScript-rendered sites, some choke; some are free, some are friction-heavy. Five methods cover the full range. Pick the one that matches your actual goal.

Method 1: Browser Reader Mode

Every modern browser has a Reader Mode (also called Reader View, Distraction-Free Mode, or similar) that strips a page down to article content. It's the fastest path to clean text on most articles.

How to use:

  1. Open the article in your browser.
  2. Click the Reader Mode icon in the address bar (looks like a book or a stack of lines). On Firefox, Safari, and Edge it's built in; on Chrome, enable it via flags or a tiny extension.
  3. The page reflows to article-only view.
  4. Select all (Ctrl/Cmd+A), copy, paste into your text editor.

Strengths: zero setup, fast, removes most layout noise, works offline if the page is already loaded.

Limitations: Reader Mode fails on many pages it doesn't recognize as articles (multi-section pages, documentation, dashboards). Output is plain text — heading structure flattens; lists may lose bullets; tables are lost.

Best for: news articles, blog posts, simple long-form. Anything where you need just the prose, not the structure.

Method 2: Print to PDF + extract

The browser-print pipeline. Print the page to PDF (Ctrl/Cmd+P → Save as PDF), then extract text from the PDF with any PDF-to-text tool.

How to use:

  1. Open the page in your browser.
  2. Print → Save as PDF. Some browsers offer a "simplified" or "clean" print option that auto-strips ads/nav — use it if available.
  3. Open the PDF in a tool that exports text. macOS Preview can do this; Adobe Reader can do this; many free tools handle it.
  4. Export as .txt.

Strengths: works even on sites with no Reader Mode support; the print stylesheet often produces a cleaner layout than the live page.

Limitations: two-step process; PDF text extraction has its own quirks (column-handling, ligatures, hyphenation). For why this is harder than it looks, see how PDF works internally.

Best for: printable pages, recipes, instructions — content where the print layout is well-curated.

Method 3: Copy-paste

The lazy classic. Select all on the page, copy, paste into a text file.

How to use:

  1. Click somewhere on the page body.
  2. Ctrl/Cmd+A, Ctrl/Cmd+C.
  3. Paste into Notepad, TextEdit, or any text editor. Save as .txt.

Strengths: universal, no tools needed, works on every page.

Limitations: picks up navigation, sidebars, modals, hidden text. Output is messy. Structure (headings, lists, tables) is destroyed. For long pages, the noise often exceeds the signal. We cover the failure modes in why copy-pasting from websites ruins your AI answers.

Best for: short snippets, single paragraphs, quick "I just need this one quote".

Method 4: html2text CLI

For developers, the command line offers html2text (a Python package) and similar tools that convert HTML to plain text more cleanly than copy-paste.

How to use:

  1. Install: pip install html2text.
  2. Fetch the page: curl -L https://example.com/article > page.html.
  3. Convert: html2text page.html > article.txt.

For more control, pandoc can also convert HTML to plain text or to Markdown with finer-grained options.

Strengths: scriptable; fast on batches; produces cleaner output than copy-paste; preserves some structure (depending on flags).

Limitations: requires CLI comfort and a working Python or pandoc install. JavaScript-rendered pages produce empty output (curl sees the HTML skeleton, not the rendered content) — you need a headless browser fetch instead. Output is text, not Markdown by default; structure preservation is partial.

Best for: automation pipelines, batch jobs, server-side processing where Python or pandoc is already available.

Method 5: URL to Markdown (best for AI use)

If your goal is to feed the page to an LLM — ChatGPT, Claude, Gemini — plain text is the wrong target. Markdown is. Markdown is plain text plus structure (# for headings, - for lists, | for tables, [text](link) for links). LLMs were trained on huge amounts of Markdown and reason over it dramatically better than over flat text.

How to use:

  1. Open /convert/url-to-markdown.
  2. Paste the URL.
  3. Hit convert. Download the .md file or copy the Markdown text.

Strengths: handles JavaScript-rendered pages; strips navigation/ads/modals; preserves headings, lists, tables, links, quotes; output is much smaller in tokens than raw HTML; works with any LLM.

Limitations: if you genuinely need plain text (not Markdown), you can convert the Markdown to text afterward, but you've added a step. For AI use, Markdown is the better target anyway.

Best for: feeding pages to AI tools, building research archives, anything where you want both readability and structure.

Quick comparison

If you're not sure which to pick:

What about images and code blocks?

None of the plain-text methods preserve images — they're stripped. Markdown preserves image references (alt text + URLs) so you can fetch them separately if needed.

Code blocks survive better in Markdown than in plain text. Markdown's ``` fences keep the code distinct from prose; plain text usually flattens code into the surrounding paragraphs. For technical documentation, Markdown is the only sensible choice.

Saving for the long term

If you're building a personal archive of articles to keep, plain text loses too much (no headings, no links, no structure). Markdown gives you durability plus structure — readable in any text editor, queryable in any note app, immune to format obsolescence. We cover the long-term archival pattern in link rot is killing your research.

The honest summary

For one-off plain text, the browser tools are fine. For anything you'll feed to AI, anything you'll keep, or anything where structure matters: convert to Markdown. The thirty-second cost saves hours of downstream pain.

A quick decision tree

If you want to skip reading and just pick a method:

Edge cases worth knowing

Foreign-language pages. All five methods handle Unicode correctly in modern tools. The exception: copy-paste from PDFs that use unusual font encodings can produce garbled output. Markdown converters built on a proper HTML parser have no such problem.

Pages with mathematical notation. Plain text destroys equations entirely. Markdown converters with LaTeX support preserve them in $...$ form, which renders correctly in any Markdown viewer that supports MathJax. For technical content with math, this difference is decisive.

Pages with embedded videos or interactive widgets. No method preserves the interactive content itself. Markdown captures the embed link as a regular link; plain text captures only the placeholder text. If the video or widget content matters, you'll need to handle it separately.

Right-to-left languages. All methods preserve the text direction in the saved file. Display behavior depends on the editor or viewer you open the file in.

What about saving the entire page (HTML, CSS, images)?

If your goal is a pixel-perfect copy of the page as it appeared — preserving layout, fonts, images — none of the methods above are right. You want a single-file HTML save (browsers offer this directly via Save As → Webpage Complete or Save as MHTML, and extensions like SingleFile bundle everything into one file). This produces a faithful visual reproduction at the cost of the structured-text benefits Markdown gives you.

The tradeoff is real: visual fidelity vs textual usability. For most archival and AI use cases, textual usability wins decisively — you almost never want to look at the saved page; you want to read, search, or feed it to a tool. Choose accordingly.

Don't overthink it

For most people, the right policy is simple: copy-paste for short snippets, URL to Markdown for everything else. The other three methods exist for specific edge cases — automation pipelines, print-friendly capture, the rare site that defeats Markdown converters. You can ignore them until you actually have one of those edge cases.

The mistake to avoid is reaching for copy-paste reflexively when the content matters. The output is degraded and the cost of switching to a converter is thirty seconds. Build the habit and use it.

A historical note

Saving web content as plain text used to be a reasonable default. In the early 2000s, browsers had limited reader modes; HTML-to-Markdown converters were research projects; LLMs didn't exist. The robust archival format was a .txt file because that was what every system reliably read. The defaults made sense for the era.

The defaults haven't aged well. Modern note tools, modern LLMs, modern search systems all do better with structure than without. Plain text still works as a lowest-common-denominator option, but it's the wrong default for new content. The cost of choosing Markdown is essentially zero; the cost of having chosen plain text shows up later, when you wish you still had the headings.

Pick the format that matches what you'll do with the file. For most modern uses, that's Markdown.

Frequently asked questions

Will saving as text capture comments on the article?
Generally no. Comments load via JavaScript after the main page renders, so static fetchers (Reader Mode, copy-paste, html2text via curl) miss them. To capture comments, scroll them into view in your browser first, then use a tool that exports the rendered DOM.
Which method preserves the original article date and author?
Markdown converters typically preserve metadata when it's exposed in the HTML (often via Open Graph or schema.org tags). Plain text methods strip everything except the visible text. If metadata matters, prefer Markdown and check the file header.
Can I save an entire site this way, not just one page?
For a few dozen pages, yes — convert each URL one at a time. For larger sites, use the multi-page workflow with a no-code automation tool feeding URLs into the converter on a schedule.