PDF to Markdown for Obsidian — Vault Setup Guide
Obsidian was built for Markdown. PDFs in your vault are dead-end attachments — no search, no graph, no backlinks. Convert your PDF library to Markdown and the same content becomes first-class vault material: searchable, linkable, visible on the graph, and ready for Zettelkasten workflows. Here's the complete setup, from first conversion to a working knowledge base.
Why Obsidian + Markdown is the right pairing
Obsidian is a Markdown editor first and a database second. Every .md file in your vault becomes a node: indexed by full-text search, available for [[wikilinks]], visible in the graph view, taggable, queryable via Dataview. PDFs in your vault get none of this — they're stored, not understood.
Three concrete capabilities you unlock by converting:
- Search across content: ctrl-O to find any phrase across your entire vault, including converted PDF content. The search index treats Markdown as first-class data.
- Wikilinks:
[[Smith2026]]in any note creates a backlink to your converted Smith 2026 paper, visible in the source's backlinks panel and in the graph view. - Graph visualization: the relationships between your converted documents and your own notes become visible — a real Zettelkasten emerges from what was just a folder of PDFs.
Converting your PDF library
Single document
For a single PDF: drop it into our PDF to Markdown for Obsidian converter, download the .md, save it into your vault. Two minutes start to finish.
Many PDFs at once
For batch ingestion (e.g., a literature review with 200 papers), MDisBetter is a web tool today and doesn't yet ship a public API or CLI — so the right path for true batch is local OSS that runs on your machine. Marker in a Python loop or Docling handle hundreds of PDFs offline; the full step-by-step is in batch convert 100+ PDFs to Markdown. The output is a folder of .md files ready to drop into your vault.
Continuous ingestion
If you keep adding new PDFs (research papers from arXiv, downloaded reports), pair an OSS converter with a folder watcher. A short Python script using watchdog + Marker (or Docling) watches an "Inbox" folder and emits the converted Markdown into "Sources" the moment a new PDF lands. Pair with Dropbox or iCloud sync and you can drop PDFs from anywhere — it's a one-time setup of about thirty lines, and it runs locally so nothing leaves your machine.
Vault organization patterns
Two main schools of thought:
Topic-based folders
Vault/
Topics/
Machine Learning/
Smith2026 - Transformers.md
Statistics/
Wasserman2024 - All of Statistics.md
Permanent Notes/
Daily Notes/Pros: easy to browse by subject. Cons: many papers belong to multiple topics — you have to pick one.
Source-only flat structure
Vault/
Sources/
Smith2026 - Transformers.md
Wasserman2024 - All of Statistics.md
Permanent Notes/
Daily Notes/Pros: no "which folder?" problem. Cons: relies entirely on tags and links for organization.
The flat-structure approach plays better to Obsidian's strengths (graph view, backlinks) and is the one most serious Obsidian users converge on. Use folders only for high-level separation (Sources vs Notes), not for topic categorization.
YAML front matter for metadata
Add a YAML block at the top of each converted note for searchable metadata:
---
title: Attention Is All You Need
authors: [Vaswani et al.]
year: 2017
type: paper
tags: [nlp, transformers, foundational]
source: 'arxiv.org/abs/1706.03762'
aliases: [Transformer paper, AIAYN]
---Obsidian indexes everything in front matter for the file properties panel, Dataview queries, and graph filtering. Aliases let other notes link to this paper by any of the alternative names you list.
For batch-converted libraries, you can prepend a default YAML block programmatically — pull authors and year from filename or PDF metadata, set sensible defaults for tags. Five lines of Python.
Wikilinks and backlinks
The killer Obsidian feature for converted PDFs is wikilinks. From a permanent note, link to a converted source: [[Smith2026 - Transformers]]. The link works by filename. The source's backlinks panel now shows your permanent note as a referrer; the graph view shows an edge between them.
Best practice: use aliases in front matter so you can link by short references. With aliases: [Transformer paper, AIAYN] in the source, all three of these work and resolve to the same note: [[Smith2026 - Transformers]], [[Transformer paper]], [[AIAYN]].
Tagging for cross-cutting themes
YAML front matter tags are great for paper-level themes. In-note #tags work for ideas that appear in only part of a note. A converted paper on graph neural networks might have tags: [gnn, deep-learning] at the top and #message-passing sprinkled in the section that discusses message passing.
The right level of tagging is taste — start broad (handful of tags per paper) and add granularity as patterns emerge. Don't over-engineer the tag taxonomy upfront.
Zettelkasten workflow
The classic Zettelkasten pattern with converted papers:
- Source notes (your converted papers) live in
Sources/— read-only, cite-able, structured - Literature notes in
Lit/— your written summary of each source in your own words, linked to the source - Permanent notes in
Permanent/— atomic ideas, each linking to the literature notes that influenced it
The graph view, after you've built up a few months of permanent notes, looks like a real concept map of your field — exactly what Luhmann's analog Zettelkasten was designed to produce. The converted PDFs are the substrate; your permanent notes are the contribution.
Useful Obsidian plugins for converted PDF workflows
- Citations: BibTeX integration for academic libraries, autocomplete cite keys
- Dataview: query YAML front matter (e.g., "all papers from 2024 tagged 'transformers'")
- Smart Connections: semantic search across the vault, useful when you don't remember exact wording
- Note Refactor: split a long converted paper into linked sub-notes by section
- Templater: standardize your literature-note template so every paper gets the same scaffolding
Combined with the converted Markdown, these turn Obsidian into a credible academic-knowledge platform — without the cost or rigidity of dedicated tools like Roam, Tinderbox, or Citavi.