URL to Markdown for SEO — Competitive Content Analysis
A content brief that beats the SERP starts with reading the SERP. Convert the top 10 ranking pages for a target query to clean Markdown, feed them to Claude or ChatGPT, and ask for the topic clusters, heading structure, and content gaps the leaders share. What used to be a half-day SEO research task collapses into 15 minutes.
Why this is hard without the right tool
- Manually analysing 10 competitor pages per target query takes hours
- Copy-paste destroys heading hierarchy — exactly the thing you need for SERP analysis
- Content brief tools (Frase, Surfer, Clearscope) cost $99-300/month for what is mostly URL-to-text plus an AI prompt
- AI content brief generation needs clean, structured input — raw HTML pollutes the analysis
- Schema markup and FAQ blocks are buried in HTML — hard to extract for competitive comparison
Recommended workflow
- Run your target query in Google; collect the top 10 URLs
- For one-off briefs: paste each URL into /convert/url-to-markdown and download. For repeat work: roll your own batch script with Trafilatura (one-line URL→Markdown) — we don't ship a programmatic API today
- Merge into a single corpus (one ## per source URL) with our Markdown merger
- Prompt Claude / ChatGPT: "extract common topic clusters, heading patterns, and content gaps"
- Use the synthesis as your content brief; write a piece that covers what the leaders cover plus what they miss
Code examples
SERP-to-brief Python pipeline (OSS — no MDisBetter API needed)
# Install: pip install trafilatura httpx
import asyncio
import httpx
import trafilatura
async def url_to_md(client: httpx.AsyncClient, url: str) -> str:
r = await client.get(url, timeout=30, follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0 (compatible; SERP-brief/1.0)"})
md = trafilatura.extract(r.text, output_format="markdown",
include_links=True, include_tables=True)
return md or ""
async def main(serp_urls):
async with httpx.AsyncClient() as client:
mds = await asyncio.gather(*[url_to_md(client, u) for u in serp_urls])
corpus = "\n\n---\n\n".join(f"## {u}\n\n{md}" for u, md in zip(serp_urls, mds))
open("brief-corpus.md", "w", encoding="utf-8").write(corpus)
# Now feed brief-corpus.md to Claude with: "Extract common H2 patterns and gaps"
asyncio.run(main([
"https://example.com/article-1",
"https://example.com/article-2",
# ...top 10 SERP results
]))
# For ad-hoc URLs you don't want to script, paste them into
# mdisbetter.com/convert/url-to-markdown and skip the code path entirely.
Frequently asked questions
How does this compare to Frase, Surfer, or Clearscope?
Those tools are URL-to-text plus an AI prompt with a nice UI, charged at $99-300/month. The MDisBetter web tool covers the URL-to-text part for free on a per-URL basis; pair with your own Claude or ChatGPT subscription and you reproduce 80% of the value at 0% of the price. For batch SERP-to-brief workflows, a 30-line Python script using Trafilatura plus the Anthropic API does the same job — the remaining 20% (NLP scoring against your draft) is buildable in an afternoon if you actually need it.
Can I extract just the H2/H3 heading structure for outline analysis?
Yes — after conversion, a 5-line script (or a regex on <code>^##? </code>) pulls just the headings. Stack them across the top 10 SERP results and the content gap analysis writes itself. Useful for spotting the "missing H2" that nobody on page 1 covers.
Does it capture FAQ schema and PAA-targeted content?
Yes — FAQ blocks come through as a clean <code>## FAQ</code> section with each Q/A as nested headings or bold/paragraph pairs. Easy to extract and compare across competitors to spot the questions everyone answers (table stakes) and the ones nobody does (your opportunity).
Best practice for AI-generated content briefs?
Convert the top 10 SERP, merge, then prompt: "(1) extract common topic clusters and the H2 each ranks under; (2) flag topics covered by 1-2 results only — these are gaps; (3) suggest 3-5 H2s our piece should add to win the SERP". The structured Markdown input makes the analysis concrete, not generic.
How do I scale this across 100+ target keywords?
Self-rolled worker queue: for each target keyword run SERP scraping via SerpAPI / DataForSEO, push the top 10 URLs through Trafilatura in parallel (it handles ~100 URLs/minute on a single worker), save merged corpus per keyword, batch-feed to Claude via the Anthropic API in another worker. A 100-keyword content programme processes in a couple of hours and a few dollars of API spend. MDisBetter doesn't expose its own API — the OSS path is what makes this scale.