Pricing Dashboard Sign up
Recent

URL to Markdown for SEO — Competitive Content Analysis

A content brief that beats the SERP starts with reading the SERP. Convert the top 10 ranking pages for a target query to clean Markdown, feed them to Claude or ChatGPT, and ask for the topic clusters, heading structure, and content gaps the leaders share. What used to be a half-day SEO research task collapses into 15 minutes.

Why this is hard without the right tool

  • Manually analysing 10 competitor pages per target query takes hours
  • Copy-paste destroys heading hierarchy — exactly the thing you need for SERP analysis
  • Content brief tools (Frase, Surfer, Clearscope) cost $99-300/month for what is mostly URL-to-text plus an AI prompt
  • AI content brief generation needs clean, structured input — raw HTML pollutes the analysis
  • Schema markup and FAQ blocks are buried in HTML — hard to extract for competitive comparison

Recommended workflow

  1. Run your target query in Google; collect the top 10 URLs
  2. For one-off briefs: paste each URL into /convert/url-to-markdown and download. For repeat work: roll your own batch script with Trafilatura (one-line URL→Markdown) — we don't ship a programmatic API today
  3. Merge into a single corpus (one ## per source URL) with our Markdown merger
  4. Prompt Claude / ChatGPT: "extract common topic clusters, heading patterns, and content gaps"
  5. Use the synthesis as your content brief; write a piece that covers what the leaders cover plus what they miss

Code examples

SERP-to-brief Python pipeline (OSS — no MDisBetter API needed)

# Install: pip install trafilatura httpx
import asyncio
import httpx
import trafilatura

async def url_to_md(client: httpx.AsyncClient, url: str) -> str:
    r = await client.get(url, timeout=30, follow_redirects=True,
                         headers={"User-Agent": "Mozilla/5.0 (compatible; SERP-brief/1.0)"})
    md = trafilatura.extract(r.text, output_format="markdown",
                             include_links=True, include_tables=True)
    return md or ""

async def main(serp_urls):
    async with httpx.AsyncClient() as client:
        mds = await asyncio.gather(*[url_to_md(client, u) for u in serp_urls])
    corpus = "\n\n---\n\n".join(f"## {u}\n\n{md}" for u, md in zip(serp_urls, mds))
    open("brief-corpus.md", "w", encoding="utf-8").write(corpus)
    # Now feed brief-corpus.md to Claude with: "Extract common H2 patterns and gaps"

asyncio.run(main([
    "https://example.com/article-1",
    "https://example.com/article-2",
    # ...top 10 SERP results
]))
# For ad-hoc URLs you don't want to script, paste them into
# mdisbetter.com/convert/url-to-markdown and skip the code path entirely.

Frequently asked questions

How does this compare to Frase, Surfer, or Clearscope?
Those tools are URL-to-text plus an AI prompt with a nice UI, charged at $99-300/month. The MDisBetter web tool covers the URL-to-text part for free on a per-URL basis; pair with your own Claude or ChatGPT subscription and you reproduce 80% of the value at 0% of the price. For batch SERP-to-brief workflows, a 30-line Python script using Trafilatura plus the Anthropic API does the same job — the remaining 20% (NLP scoring against your draft) is buildable in an afternoon if you actually need it.
Can I extract just the H2/H3 heading structure for outline analysis?
Yes — after conversion, a 5-line script (or a regex on <code>^##? </code>) pulls just the headings. Stack them across the top 10 SERP results and the content gap analysis writes itself. Useful for spotting the "missing H2" that nobody on page 1 covers.
Does it capture FAQ schema and PAA-targeted content?
Yes — FAQ blocks come through as a clean <code>## FAQ</code> section with each Q/A as nested headings or bold/paragraph pairs. Easy to extract and compare across competitors to spot the questions everyone answers (table stakes) and the ones nobody does (your opportunity).
Best practice for AI-generated content briefs?
Convert the top 10 SERP, merge, then prompt: "(1) extract common topic clusters and the H2 each ranks under; (2) flag topics covered by 1-2 results only — these are gaps; (3) suggest 3-5 H2s our piece should add to win the SERP". The structured Markdown input makes the analysis concrete, not generic.
How do I scale this across 100+ target keywords?
Self-rolled worker queue: for each target keyword run SERP scraping via SerpAPI / DataForSEO, push the top 10 URLs through Trafilatura in parallel (it handles ~100 URLs/minute on a single worker), save merged corpus per keyword, batch-feed to Claude via the Anthropic API in another worker. A 100-keyword content programme processes in a couple of hours and a few dollars of API spend. MDisBetter doesn't expose its own API — the OSS path is what makes this scale.

Try the tool free →