How to Feed a Website to ChatGPT (Complete Guide)
Feeding a website to ChatGPT sounds like it should be a one-step operation — paste a URL, get an answer. In practice it splits into four very different problems depending on whether you want one page, a section, an entire docs site, or something in between. The browse feature handles maybe 30% of cases well. The other 70% need a different approach. Here's the full playbook.
ChatGPT browse vs manual
ChatGPT can fetch URLs on its own. When it works, this is the easiest path. When it doesn't, the failures are silent — you get an answer that looks confident but is wrong. Knowing which approach to use saves time.
Browse works well when: the page is plain HTML, publicly accessible, not behind Cloudflare, not paywalled, not heavily JavaScript-rendered, and the question is simple enough that ChatGPT only needs the page summary.
Browse fails when:
- The page is bot-blocked. Many major publishers, e-commerce sites, and SaaS tools refuse non-browser traffic.
- The page is JavaScript-rendered. ChatGPT's fetcher sees an empty HTML skeleton; the actual article text never enters the prompt.
- The page is paywalled or login-gated.
- The page is too long. Browse extracts a chunk and silently truncates.
- You need precise quotes. Browse summarizes; the citations frequently misattribute or invent.
For these cases — which add up to most real-world usage — the manual workflow wins.
Single page: the canonical workflow
For one URL you want ChatGPT to deeply understand:
- Open /convert/url-to-markdown.
- Paste the URL, hit convert.
- Copy the Markdown (short pages) or download the
.mdfile (long pages). - Start a fresh ChatGPT conversation. Attach the file or paste the Markdown.
- Ask your question.
The conversion strips navigation, ads, footers, share buttons, modals, and other layout noise — leaving the article body with structure (headings, lists, tables, quotes, links) intact. ChatGPT now has a clean, structured source it can reason over precisely.
For why this beats both copy-paste and browse, see why copy-pasting from websites ruins your AI answers.
Multi-page: a section or an entire site
For an entire documentation site, a multi-part article series, or a knowledge base, you have two strategies depending on size.
Under 50 pages
Convert each URL individually, then concatenate the Markdown files into a single document. The combined file is one upload, one prompt, one answer. ChatGPT can hold a few hundred thousand tokens of Markdown comfortably — that's typically 100-200 pages of documentation.
To concatenate quickly:
cat page1.md page2.md page3.md > combined.mdOr on Windows PowerShell:
Get-Content page1.md, page2.md, page3.md | Set-Content combined.mdAdd a top-level header at the start of each section if you want ChatGPT to keep track of which page each piece came from.
Over 50 pages
At this scale, a single combined upload starts to bump against context limits. Two options:
- Selective extraction. Convert only the pages relevant to your actual question. Most documentation sites have a long tail of pages you'll never ask about. A 200-page site usually compresses to 20-40 relevant pages for any specific query.
- Chunking + retrieval. Split the combined Markdown into chunks (one per page, or one per
##heading), index them, and retrieve only the relevant chunks per question. This is the RAG pattern; we cover the chunking step in detail in chunking strategies.
Convert to Markdown method (why this is the right format)
The reason Markdown is the universal answer here is that LLMs read it natively. They were trained on enormous amounts of Markdown — README files, GitHub issues, Stack Overflow posts, technical blogs. Heading levels carry meaning. List indentation conveys hierarchy. Code blocks are unambiguous. Compared with HTML (verbose, full of layout junk), plain text (no structure), or PDF (a printing format the model has to reverse-engineer), Markdown is the easiest format for the model to reason over and the cheapest in tokens.
The same principle applies if your source happens to be a PDF instead of a web page — convert to Markdown first. See our PDF to Markdown converter and the deep-dive on best format for LLM input.
Chunking large sites
For a doc site over a few hundred pages — Stripe docs, AWS docs, a major framework — feeding the whole thing to ChatGPT is impractical even with chunking. The smart pattern:
- Convert the whole site to Markdown (one file per page).
- Index the chunks with embeddings (any vector store: pgvector, Pinecone, Weaviate, or even a local FAISS).
- For each user question, retrieve the top 5-10 relevant chunks.
- Send only those chunks to ChatGPT, along with the question.
This is the RAG pattern. It scales to arbitrarily large corpora because you never send more than a handful of pages per query. We cover end-to-end implementation in RAG pipeline guide — the principles transfer cleanly from PDF sources to URL sources.
Common gotchas
Login-gated content. Public converters can't authenticate. You need to be logged in yourself, then either copy the rendered article into a Markdown editor or use a browser extension that exports the current page.
Geo-restricted content. Some converters route through specific regions. If the page works in your browser but not in the converter, the fetch IP may be blocked.
Single-page apps. Some SPAs render content only after user interaction (clicking tabs, expanding sections). A static fetch sees only the initial state. For these, navigate to the specific state you want, then export from the browser.
Stale content. Your converted Markdown is a snapshot. If the source page changes, your file is outdated. For frequently-updated content, re-convert before each major use.
Doing it for Claude and Gemini
Same workflow, different upload box. Markdown is universal across LLMs. Claude in particular handles very long Markdown contexts well — useful for the multi-page case where ChatGPT might struggle. See how to feed documentation to Claude for Claude-specific tips.
The 30-second test
Next time you want to ask ChatGPT about a web page: try browse, try copy-paste, and try the URL-to-Markdown workflow. Same question, three sources. The Markdown answer is dramatically more accurate, more specific, and more grounded in the actual page content. After running this test once or twice you stop reaching for the other two.
Prompts that work better with a Markdown source
The convert-first workflow doesn't just improve answer quality on whatever question you ask. It unlocks question types that fail without it. Some examples:
- Quote extraction. "Pull every direct quote from the experts cited in this article." Browse mode hallucinates quotes. With the Markdown source, ChatGPT extracts them verbatim with section context.
- Comparative analysis. "Compare this article's argument with the one I uploaded yesterday." Requires both sources to be reliably in context — which means doing the conversion yourself for both.
- Structural questions. "What are the section headings of this guide?" Trivial when headings survive the conversion; impossible when they were collapsed by copy-paste.
- Tabular data extraction. "Convert the comparison table in this article to a CSV." Tables only survive in Markdown form; from copy-paste, the data is scrambled.
- Citation generation. "Cite the section number and paragraph for each claim you make." Requires the LLM to see the document structure, which only Markdown reliably preserves.
When you need depth, not breadth
One more pattern worth knowing: feeding a single article to ChatGPT and asking five increasingly deep questions about it (rather than five different articles each with one question) produces better insight per minute. The convert-first approach makes this practical because the source is in context once, then reused across the whole conversation. Iterative deep-dives benefit disproportionately from clean source material.
What about Custom GPTs and Projects?
If you find yourself asking about the same site repeatedly — a vendor's documentation, a competitor's blog, a specific research source — promoting it from per-conversation upload to a persistent knowledge base saves enormous friction. ChatGPT's Custom GPTs and Claude's Projects both support knowledge bases. Convert your URLs once, upload the Markdown to the knowledge base, then chat without re-uploading anything. We cover the developer-facing version of this in how to feed documentation to Claude.
The honest summary
Browse mode is convenient when it works. Copy-paste is fast but lossy. Convert-first is the workflow that delivers consistently good answers on any web content, at any scale, on any LLM. The cost is thirty seconds per page once you have the habit. The payoff is every grounded conversation gets materially better, and the failure modes that used to derail you (silent truncation, hallucinated quotes, vague summaries) just stop happening.
A note on context window economics
It's worth understanding why feeding clean Markdown extends your effective context window. When raw HTML enters a prompt, every <div>, <span>, inline style, and embedded script consumes tokens. A typical mid-size article page in raw HTML is 8,000 to 15,000 tokens. The same article as Markdown is 1,500 to 3,500 tokens. Across a long conversation where you're feeding multiple articles, the difference compounds: you can either fit five articles of Markdown or one article of HTML in the same context budget. For comparative analysis tasks especially — synthesizing across many sources — Markdown is the difference between practical and impossible.