URL to Markdown for Obsidian: Better Web Clipping
Browser-extension web clippers are convenient and inconsistent. They sometimes capture the wrong region of the page, often produce Markdown that's heavy on inline HTML wrappers, image-data-URI bloat, and "Powered by" footers. There is a cleaner way that takes 10 seconds, requires no plugin, and produces vault-ready Markdown. Wrap a tiny script around it and you also get wikilinks, tags, and frontmatter. Here is the workflow.
The problem with most web clippers
Three recurring annoyances:
- Inline HTML survives. Many clippers preserve the original DOM structure including divs, spans, and inline styles. Open a clipped page two months later and the source is half Markdown, half HTML. Hard to edit, hard to grep, ugly in source mode.
- Image bloat. By default some clippers embed images as base64 data URIs. A single article with five photos balloons to 4-8 MB. Multiply by 50 clips per month and your vault doubles in size for no functional gain.
- No structural enrichment. The clipper doesn't add tags, doesn't generate a frontmatter block from page metadata, doesn't link to your daily note. Every clip lands as an isolated file with no graph integration.
You can configure your way around some of this, but the configuration UI is buried and the defaults are wrong for most users.
The mdisbetter approach in one sentence
Convert the URL to clean Markdown via our URL converter, save the file directly to your vault. The output is GitHub-Flavored Markdown — no inline HTML, no base64 — and the file is immediately indexed by Obsidian on your next focus.
Three workflows, ranked by frequency of use
Workflow 1: Quick clip (browser → vault)
For ad-hoc clipping (a few articles a day):
- Copy the URL of the article you're reading
- Open the URL converter in a pinned tab
- Paste, convert, click Download (saves
.mdwith the page title as filename) - Move the file to your vault (or set the download folder to your vault directly)
10 seconds end-to-end. The output is clean: H1 from page title, H2 for sections, GFM tables, fenced code blocks with language hints. No HTML.
Workflow 2: Frontmatter-enriched clip (OSS scripted)
For a knowledge base where every clip needs structured metadata. MDisBetter doesn't currently expose a programmatic URL-to-Markdown API, so the right path for scripted enrichment is to extract Markdown locally with Trafilatura (MIT-licensed OSS), then write the file with the frontmatter you want:
# pip install trafilatura
import re
from datetime import date
from pathlib import Path
import trafilatura
VAULT = Path.home() / 'Obsidian' / 'MyVault' / 'Clips'
def clip(url, tags=None):
downloaded = trafilatura.fetch_url(url)
md = trafilatura.extract(
downloaded,
output_format='markdown',
include_links=True,
include_tables=True,
)
if not md:
print(f'EXTRACT_FAIL {url}')
return
meta = trafilatura.extract_metadata(downloaded)
title = (meta.title if meta else None) or 'Untitled'
safe = re.sub(r'[^\w\s-]', '', title)[:80].strip()
out = VAULT / f'{safe}.md'
fm = (
'---\n'
f'title: "{title}"\n'
f'source: {url}\n'
f'clipped: {date.today().isoformat()}\n'
f'tags: [clipped, {", ".join(tags or [])}]\n'
'---\n\n'
)
out.write_text(fm + md, encoding='utf-8')
print(f'Saved {out.name}')
clip('https://en.wikipedia.org/wiki/Markdown', tags=['reference', 'markdown'])Now every clip lands in your vault with proper YAML frontmatter that Obsidian's Properties view, Dataview, and graph all consume. Tags route the clip to the right corner of your knowledge base automatically.
Workflow 3: Auto-link to daily note
The killer Obsidian feature. Every time you clip an article, append a wikilink to today's daily note. The article enters your daily timeline naturally:
def clip_with_daily_link(url, tags=None):
# ... same extraction as above ...
out_name = out.stem # without .md extension
daily = VAULT / 'Daily' / f'{date.today().isoformat()}.md'
daily.parent.mkdir(exist_ok=True)
if not daily.exists():
daily.write_text(f'# {date.today():%A, %B %d, %Y}\n\n## Clipped\n', encoding='utf-8')
with open(daily, 'a', encoding='utf-8') as f:
f.write(f'- [[{out_name}]] from {url}\n')Open Obsidian. Today's daily note now lists every clip from today as wikilinks. Click any one to jump into the full article. Open graph view to see today's reading clustered around your daily node.
Bonus: Bookmarklet that opens the web converter
Drag this to your bookmarks bar — clicking it on any article opens the MDisBetter URL converter pre-filled with the current page's URL:
javascript:(function(){
const u = encodeURIComponent(location.href);
window.open('https://mdisbetter.com/convert/url-to-markdown?url=' + u, '_blank');
})();One click takes you to the converter with the URL ready. Click Convert, click Download, drop the .md into your vault. Two clicks plus a drag — about as fast as the official Web Clipper but with cleaner output.
For a script-driven flow that goes straight to disk without opening a tab, use the Workflow 2 Python script above (Trafilatura).
Wikilink generation
Manually adding wikilinks ([[concept]]) is what makes Obsidian Obsidian — the graph emerges from those links. You can post-process clipped Markdown to auto-link known concepts:
def add_wikilinks(md, vault_path):
"""Wrap mentions of existing vault notes in [[double brackets]]."""
notes = {f.stem for f in vault_path.glob('**/*.md')}
for note in sorted(notes, key=len, reverse=True): # longest first
if len(note) < 4: continue # skip very short names
# word-boundary match, case-insensitive
md = re.sub(
rf'\b({re.escape(note)})\b',
r'[[\1]]',
md,
flags=re.IGNORECASE
)
return mdRun this on every clip and you get automatic graph integration: any mention of an existing note becomes a link, building your network as you clip.
Image handling
By default, the URL converter outputs images as Markdown  tags pointing at the original URL. This keeps your vault tiny but creates an external dependency: if the source page deletes the image, your clip breaks.
Two alternatives, depending on your priorities:
Option A: Download images locally (post-processing)
import re, requests
from pathlib import Path
def localize_images(md, slug, attachments_dir):
attachments_dir.mkdir(exist_ok=True)
def repl(m):
url = m.group(2)
ext = Path(url.split('?')[0]).suffix or '.jpg'
local = attachments_dir / f'{slug}_{abs(hash(url))}{ext}'
if not local.exists():
try:
local.write_bytes(requests.get(url, timeout=15).content)
except Exception:
return m.group(0)
return f''
return re.sub(r'!\[([^\]]*)\]\(([^)]+)\)', repl, md)Option B: Inline as base64 (post-processing)
For content you absolutely need to survive the source going down, encode each fetched image to base64 and rewrite the Markdown to . Self-contained, but the file size grows a lot — use sparingly.
Comparison: typical browser-extension clipper vs this approach
| Feature | Browser-extension clipper | mdisbetter |
|---|---|---|
| Setup | Install plugin + extension | None for the web tool, ~5 min for OSS scripts |
| Output cleanliness | Often mixed Markdown + HTML | Pure GFM Markdown |
| Image handling | Sometimes base64 (heavy) by default | URL refs (light) by default |
| Frontmatter | Manual | Scriptable (Trafilatura) |
| Daily note integration | Manual | Scriptable |
| Wikilink injection | Manual | Scriptable |
| Custom selector | Limited | Yes via BeautifulSoup |
Working with PDFs in Obsidian?
For PDF-to-vault workflows (research papers, reports, legal documents), see PDF to Markdown for Obsidian and the corresponding vault setup guide. The same wikilink and daily-note patterns apply — the only difference is the source format.
Folder structure inside the vault
How you organize clips inside the vault matters more than how you get them there. Three patterns that scale:
Inbox + curated
All clips land in Clips/Inbox/. Once a week, you triage: move keepers to Clips/Reference/ with proper frontmatter and tags, archive the rest to Clips/Archive/, delete obvious duplicates. Mimics email triage. Works well for high-volume clippers.
By topic
Clips land in topic folders directly: Clips/AI/, Clips/Programming/, Clips/Business/. Faster to find later, requires deciding the topic at clip time. Works well when your topics are stable.
By date
Clips land in date folders: Clips/2026/05/. Mirrors how you took the action — "the article I clipped last May" maps directly to a folder. Combined with the daily-note workflow this becomes very natural.
Pick one. Don't mix two. Mixing creates ambiguity that you'll later resent.
Excalidraw and Canvas integration
Obsidian's Canvas plugin lets you arrange notes spatially on an infinite whiteboard. Clipped articles become first-class objects in canvases — drop a clip on a canvas, link it to your own notes, draw arrows showing how the ideas connect. Especially powerful for synthesis work (literature reviews, market research, idea development) where the spatial layout helps you see connections.
The clipper integration: the Markdown files we produce work in Canvas without any modification. Drop the file into Canvas, resize, position. The first H2 of the clip shows as the canvas card title; the rest is browseable.
Search and graph benefits
Obsidian's search and graph features depend on the Markdown being clean. With clipper output that mixes inline HTML, full-text search hits inside <span> wrappers and <div> attributes that are invisible in reading mode but very visible to the search index — leading to false positives and confusing search rankings. With pure Markdown clips, search behaves predictably: every hit is a hit on text the user can see. The graph is similarly cleaner: wikilinks are the only edge type, no spurious edges from clipper-injected metadata. Two months in, the difference compounds — your vault behaves like one coherent knowledge base instead of two parallel ones (your notes vs. your clipper output).
Backup and portability
One advantage of the Markdown-first approach over plugin-driven clippers: your clips are pure text in your filesystem. Backup is whatever your filesystem backup is — Time Machine, Backblaze, rsync to a NAS. Migration to another tool (Logseq, plain VS Code, Bear, iA Writer, anything that reads Markdown) is a folder copy.
Most browser-extension clippers' outputs are technically also Markdown, but the inline-HTML wrappers and base64 image bloat make them harder to migrate cleanly. Plain GFM Markdown round-trips through any tool that handles Markdown at all.
What the bookmarklet won't do
Two things to know about the bookmarklet:
- It opens the converter in a new tab pre-filled with the URL — you still click Convert and Download. To go fully hands-off, set up a Folder Action / Hazel rule that moves new
.mddownloads into your vault automatically. On Mac, Hazel does this in 30 seconds; on Windows, PowerToys can be configured similarly. - The web converter fetches anonymously, with no access to your browser cookies. For login-required content, use the OSS Workflow 2 script with
requests(orhttpx) plus your session cookie or bearer token, then run Trafilatura on the response.
Recommendation
For one-off clipping: bookmarklet plus the web UI. For systematic knowledge-management workflows: the Trafilatura-based Python script with frontmatter and daily-note appending. Either way, you'll get cleaner clips than most browser-extension clippers produce. The output is cleaner, the integration is deeper, and the friction is lower. See also URL to Markdown for Notion for the same approach in a Notion workflow.