Word to Markdown Benchmark: 8 Tools Tested for Accuracy
There are about a dozen real Word-to-Markdown tools on the market in 2026 — some commercial, some OSS, some free web tools, some library-only. We tested eight of the most-used against five real-world document types: a resume, a contract, a technical spec, a financial report, and a thesis chapter. Each tool scored on heading preservation, table preservation, image preservation, and list preservation. Honest results — including where we rank below alternatives. The TL;DR: Pandoc wins overall, Word2MD wins for AI-extracted images, mdisbetter is competitive in the free-web-tool tier.
Tools tested
- MDisBetter Word to Markdown — free web tool, no signup
- Word2MD.net — paid web tool with batch + AI image processing
- Pandoc — OSS CLI, the industry gold standard
- Mammoth.js — OSS JavaScript library
- Monkt — paid web tool focused on AI ingestion
- DocsToMarkdown — Google Docs add-on (also handles Word imports)
- ToMarkdown.org — free web tool, multi-format
- Hyperleap AI — paid AI document platform with Markdown export
Test documents
- Resume (2 pages) — H1, H2, bulleted lists, simple tables, contact info, hyperlinks. The cleanest test case.
- Contract (12 pages) — numbered headings (1.1, 1.2, 2.1.a), nested lists, footnotes, definitions, signature block, page numbers in headers/footers.
- Technical spec (24 pages) — H1-H4, code blocks, complex tables (merged cells, multi-row headers), inline equations, cross-references.
- Financial report (38 pages) — embedded charts (as images), wide tables (12+ columns), multi-column layout sections, footnotes per page.
- Thesis chapter (45 pages) — citations (Word bibliography), figure captions, equations (mix of inline and display), footnotes, indexed terms.
Scoring
Each tool scored 0-5 on four axes per document, then averaged across documents:
- Heading preservation — H1-H6 levels intact
- Table preservation — structure, alignment, content survival
- Image preservation — extraction, alt text, placement
- List preservation — nesting, numbering, mixed types
Maximum: 20/20. Disclosure: we built MDisBetter. Where competitors win, we say so.
Aggregate results
| Tool | Headings | Tables | Images | Lists | Total /20 |
|---|---|---|---|---|---|
| Pandoc | 5 | 4 | 4 | 5 | 18 |
| Word2MD.net | 4 | 4 | 5 | 4 | 17 |
| Hyperleap AI | 4 | 4 | 5 | 3 | 16 |
| MDisBetter | 5 | 3 | 3 | 4 | 15 |
| Mammoth.js | 4 | 3 | 3 | 4 | 14 |
| Monkt | 4 | 3 | 4 | 3 | 14 |
| DocsToMarkdown | 3 | 3 | 2 | 3 | 11 |
| ToMarkdown.org | 3 | 2 | 2 | 3 | 10 |
Per-document winners
| Document | Winner | Runner-up |
|---|---|---|
| Resume | Pandoc, MDisBetter, Word2MD (3-way tie at 19/20) | — |
| Contract | Pandoc | Word2MD |
| Technical spec | Pandoc | MDisBetter |
| Financial report | Word2MD | Hyperleap AI |
| Thesis chapter | Pandoc | Hyperleap AI |
Resume — three-way tie
The simplest test case, and the easiest to score perfectly. Pandoc, MDisBetter, and Word2MD all produced essentially flawless Markdown: heading levels correct, contact info preserved as a clean paragraph block, the skills table converted cleanly to a GFM pipe table, hyperlinks intact, bullet lists with proper nesting under each job entry. Mammoth.js and Hyperleap close behind. The bottom three (Monkt, DocsToMarkdown, ToMarkdown) all dropped or mangled the contact-info table at the top.
Takeaway: for resumes and similar simple docs, almost any tool works. Pick on convenience.
Contract — Pandoc wins decisively
Numbered headings (1.1.a) are where most tools stumble. Pandoc preserved the full numbering hierarchy as headings (with the numbers as part of the heading text). MDisBetter preserved the numbering but at slightly inconsistent heading levels for deeply nested clauses (4.2.a became plain bold rather than H4). Word2MD did similar. Mammoth.js converted the deeply-nested numbered items to a regular list rather than headings.
Footnotes: Pandoc preserves them as GFM footnote syntax ([^1]) — the only tool that did this cleanly. MDisBetter inlined footnote text in parentheses next to the reference, losing the original structure. Word2MD did similar. Mammoth.js dropped footnote markers entirely but kept the text. The bottom three lost footnotes completely.
The signature block (a small table at the end with date/name/signature columns) survived in Pandoc, MDisBetter, Word2MD, and Mammoth. The bottom three flattened it.
Technical spec — Pandoc by a hair, MDisBetter close
The hardest test for table preservation. The spec includes a 6-column table with merged header cells ("Authentication" spanning 3 columns, "Authorization" spanning 3 columns) and a multi-row header. No tool handled this perfectly — GFM doesn't support merged cells. The grading was on graceful degradation:
- Pandoc: repeated the merged value across cells. Visually misleading but data preserved.
- MDisBetter: emptied merged cells with value in the first. Visually correct but missing data in repeated cells.
- Word2MD: kept as raw HTML <table> with rowspan/colspan intact. Best fidelity but breaks portability.
- Mammoth.js: similar to Pandoc.
- Bottom three: dropped or flattened the table.
Code blocks: all top tools preserved fenced code blocks correctly. Pandoc and MDisBetter both detected and preserved language hints (```python). Word2MD ships them without language hints. Mammoth and the rest converted code blocks to indented paragraphs.
Inline equations: Pandoc converts to LaTeX-syntax inline math ($x = y$). MDisBetter and Word2MD preserve as plain text. Mammoth and the rest variously preserve or drop. For docs with heavy math, Pandoc wins outright.
Financial report — Word2MD wins on images
The financial report had 12 embedded charts (saved as images in the .docx) and several wide data tables. Word2MD shipped the images with AI-generated alt text describing each chart ("Bar chart showing Q1-Q4 revenue by region, EMEA growing from 100 to 120 over the year"). For LLM ingestion or accessibility, this is hugely valuable. Hyperleap AI does similar.
MDisBetter, Pandoc, and Mammoth all preserve images correctly but with the original alt text from the Word doc (often empty or generic like "Chart 3"). For most users that's fine; for AI workflows, AI-generated alt text matters.
Multi-column layouts in Word (text in 2 or 3 parallel columns) all flattened to single-column reading order in every tool. None of them preserve column structure — and reasonably so, because Markdown has no concept of columns.
Wide tables (12+ columns) survived structure in all top tools but became hard to read in Markdown source. None of the tools offered an alternative representation; that's manual cleanup territory.
Thesis chapter — Pandoc dominates
Word's bibliography integration is rare and finicky. Pandoc has the best handling: it can extract the bibliography to BibTeX or CSL JSON with the right flags, and convert in-text citations to pandoc-citeproc syntax. No other tool comes close.
Footnotes (this chapter had 89 of them): Pandoc preserved every one as GFM footnote. MDisBetter and Word2MD inlined them. Mammoth and the rest variously dropped or partial-preserved.
Display equations: Pandoc converts to LaTeX block math. MDisBetter and Word2MD preserve the rendered image of the equation. Mammoth converts equation images. Bottom three drop equations entirely.
Figure captions: Pandoc preserves them as italic paragraphs immediately after the figure (the convention). Other tools typically strip the caption-figure relationship.
Where MDisBetter wins
Free web tool with no signup, no quota for occasional use, no install. Best heading preservation in the free-web-tool tier. Good list handling. Multi-format breadth — same UX across Word, PDF, URL, audio, video.
Where MDisBetter loses
- Footnotes: Pandoc handles these dramatically better. If you have footnote-heavy docs (legal, academic, financial), use Pandoc.
- AI image alt text: Word2MD and Hyperleap generate descriptive alt text for embedded charts and screenshots. We don't. For LLM ingestion of image-heavy docs, those tools are better.
- Batch processing: We're one-file-at-a-time. Word2MD and Pandoc both handle batches.
- API/CLI/SDK: We're a web tool only. Pandoc has a CLI; Mammoth has a JS library; Word2MD has paid API access.
- Bibliography handling: Pandoc only.
Where Pandoc wins (almost everywhere)
Pandoc is the universal document converter and it shows. Best on contracts, technical specs, theses. CLI for batch. Free, OSS, MIT-licensed, no quota. The downside: install + command line, which is a barrier for non-technical users.
Where Word2MD wins
AI image processing (descriptive alt text on charts and screenshots), Word-specific batch processing, paid tier with higher limits. The right paid choice for image-heavy Word workflows.
Where Hyperleap AI wins
End-to-end AI document platform with strong Markdown export. If you're building AI workflows that need OCR, layout analysis, and Markdown extraction in one tool, Hyperleap is competitive.
Where Mammoth.js wins
In-process JavaScript library — perfect for Node.js apps that need to convert Word docs uploaded by users. Clean semantic output. The right tool for developers building tooling, not for end users.
Picking by use case
- One-off Word doc, no install, free: MDisBetter
- Batch, complex docs, command line OK: Pandoc
- Building Node.js tooling: Mammoth.js
- Image-heavy docs needing AI alt text: Word2MD or Hyperleap
- Academic / legal / footnote-heavy: Pandoc
- Google Docs add-on integration: DocsToMarkdown
What about other source formats?
Most document workflows mix Word with PDFs and web pages. For those see best free PDF to Markdown converters and PDF to Markdown, and URL to Markdown. Many production pipelines use one tool for both formats because output Markdown composes cleanly downstream.
Notes on reproducibility
All five test documents are real (anonymised) docs from production use, not synthetic ones. Versions tested: Pandoc 3.5, Mammoth.js 1.8, plus latest production versions of all web tools as of May 2026. Web tools update frequently — re-running this benchmark in 6 months may shift scores by 1-2 points. The broad ranking (Pandoc, Word2MD, MDisBetter at the top tier; Mammoth/Monkt mid; ToMarkdown bottom) has been stable for over a year. See also 2026 ranked review and formatting preservation accuracy test.
How to verify these numbers yourself
Pick three of your own representative Word docs (one simple, one complex, one with the features you care about most), run them through the top 3-4 tools that match your constraints (free vs paid, web vs CLI), and score the outputs against the same four axes. Within an hour you'll have your-corpus numbers, which trump any generic benchmark — including this one.