May 10, 2026 · 12 min read · MDisBetter

Word to Markdown Benchmark: 8 Tools Tested for Accuracy

There are about a dozen real Word-to-Markdown tools on the market in 2026 — some commercial, some OSS, some free web tools, some library-only. We tested eight of the most-used against five real-world document types: a resume, a contract, a technical spec, a financial report, and a thesis chapter. Each tool scored on heading preservation, table preservation, image preservation, and list preservation. Honest results — including where we rank below alternatives. The TL;DR: Pandoc wins overall, Word2MD wins for AI-extracted images, mdisbetter is competitive in the free-web-tool tier.

Tools tested

MDisBetter Word to Markdown — free web tool, no signup
Word2MD.net — paid web tool with batch + AI image processing
Pandoc — OSS CLI, the industry gold standard
Mammoth.js — OSS JavaScript library
Monkt — paid web tool focused on AI ingestion
DocsToMarkdown — Google Docs add-on (also handles Word imports)
ToMarkdown.org — free web tool, multi-format
Hyperleap AI — paid AI document platform with Markdown export

Test documents

Resume (2 pages) — H1, H2, bulleted lists, simple tables, contact info, hyperlinks. The cleanest test case.
Contract (12 pages) — numbered headings (1.1, 1.2, 2.1.a), nested lists, footnotes, definitions, signature block, page numbers in headers/footers.
Technical spec (24 pages) — H1-H4, code blocks, complex tables (merged cells, multi-row headers), inline equations, cross-references.
Financial report (38 pages) — embedded charts (as images), wide tables (12+ columns), multi-column layout sections, footnotes per page.
Thesis chapter (45 pages) — citations (Word bibliography), figure captions, equations (mix of inline and display), footnotes, indexed terms.

Scoring

Each tool scored 0-5 on four axes per document, then averaged across documents:

Heading preservation — H1-H6 levels intact
Table preservation — structure, alignment, content survival
Image preservation — extraction, alt text, placement
List preservation — nesting, numbering, mixed types

Maximum: 20/20. Disclosure: we built MDisBetter. Where competitors win, we say so.

Aggregate results

Tool	Headings	Tables	Images	Lists	Total /20
Pandoc	5	4	4	5	18
Word2MD.net	4	4	5	4	17
Hyperleap AI	4	4	5	3	16
MDisBetter	5	3	3	4	15
Mammoth.js	4	3	3	4	14
Monkt	4	3	4	3	14
DocsToMarkdown	3	3	2	3	11
ToMarkdown.org	3	2	2	3	10

Per-document winners

Document	Winner	Runner-up
Resume	Pandoc, MDisBetter, Word2MD (3-way tie at 19/20)	—
Contract	Pandoc	Word2MD
Technical spec	Pandoc	MDisBetter
Financial report	Word2MD	Hyperleap AI
Thesis chapter	Pandoc	Hyperleap AI

Resume — three-way tie

The simplest test case, and the easiest to score perfectly. Pandoc, MDisBetter, and Word2MD all produced essentially flawless Markdown: heading levels correct, contact info preserved as a clean paragraph block, the skills table converted cleanly to a GFM pipe table, hyperlinks intact, bullet lists with proper nesting under each job entry. Mammoth.js and Hyperleap close behind. The bottom three (Monkt, DocsToMarkdown, ToMarkdown) all dropped or mangled the contact-info table at the top.

Takeaway: for resumes and similar simple docs, almost any tool works. Pick on convenience.

Contract — Pandoc wins decisively

Numbered headings (1.1.a) are where most tools stumble. Pandoc preserved the full numbering hierarchy as headings (with the numbers as part of the heading text). MDisBetter preserved the numbering but at slightly inconsistent heading levels for deeply nested clauses (4.2.a became plain bold rather than H4). Word2MD did similar. Mammoth.js converted the deeply-nested numbered items to a regular list rather than headings.

Footnotes: Pandoc preserves them as GFM footnote syntax ([^1]) — the only tool that did this cleanly. MDisBetter inlined footnote text in parentheses next to the reference, losing the original structure. Word2MD did similar. Mammoth.js dropped footnote markers entirely but kept the text. The bottom three lost footnotes completely.

The signature block (a small table at the end with date/name/signature columns) survived in Pandoc, MDisBetter, Word2MD, and Mammoth. The bottom three flattened it.

Technical spec — Pandoc by a hair, MDisBetter close

The hardest test for table preservation. The spec includes a 6-column table with merged header cells ("Authentication" spanning 3 columns, "Authorization" spanning 3 columns) and a multi-row header. No tool handled this perfectly — GFM doesn't support merged cells. The grading was on graceful degradation:

Pandoc: repeated the merged value across cells. Visually misleading but data preserved.
MDisBetter: emptied merged cells with value in the first. Visually correct but missing data in repeated cells.
Word2MD: kept as raw HTML <table> with rowspan/colspan intact. Best fidelity but breaks portability.
Mammoth.js: similar to Pandoc.
Bottom three: dropped or flattened the table.

Code blocks: all top tools preserved fenced code blocks correctly. Pandoc and MDisBetter both detected and preserved language hints (```python). Word2MD ships them without language hints. Mammoth and the rest converted code blocks to indented paragraphs.

Inline equations: Pandoc converts to LaTeX-syntax inline math ( $x = y$ ). MDisBetter and Word2MD preserve as plain text. Mammoth and the rest variously preserve or drop. For docs with heavy math, Pandoc wins outright.

Financial report — Word2MD wins on images

The financial report had 12 embedded charts (saved as images in the .docx) and several wide data tables. Word2MD shipped the images with AI-generated alt text describing each chart ("Bar chart showing Q1-Q4 revenue by region, EMEA growing from 100 to 120 over the year"). For LLM ingestion or accessibility, this is hugely valuable. Hyperleap AI does similar.

MDisBetter, Pandoc, and Mammoth all preserve images correctly but with the original alt text from the Word doc (often empty or generic like "Chart 3"). For most users that's fine; for AI workflows, AI-generated alt text matters.

Multi-column layouts in Word (text in 2 or 3 parallel columns) all flattened to single-column reading order in every tool. None of them preserve column structure — and reasonably so, because Markdown has no concept of columns.

Wide tables (12+ columns) survived structure in all top tools but became hard to read in Markdown source. None of the tools offered an alternative representation; that's manual cleanup territory.

Thesis chapter — Pandoc dominates

Word's bibliography integration is rare and finicky. Pandoc has the best handling: it can extract the bibliography to BibTeX or CSL JSON with the right flags, and convert in-text citations to pandoc-citeproc syntax. No other tool comes close.

Footnotes (this chapter had 89 of them): Pandoc preserved every one as GFM footnote. MDisBetter and Word2MD inlined them. Mammoth and the rest variously dropped or partial-preserved.

Display equations: Pandoc converts to LaTeX block math. MDisBetter and Word2MD preserve the rendered image of the equation. Mammoth converts equation images. Bottom three drop equations entirely.

Figure captions: Pandoc preserves them as italic paragraphs immediately after the figure (the convention). Other tools typically strip the caption-figure relationship.

Where MDisBetter wins

Free web tool with no signup, no quota for occasional use, no install. Best heading preservation in the free-web-tool tier. Good list handling. Multi-format breadth — same UX across Word, PDF, URL, audio, video.

Where MDisBetter loses

Footnotes: Pandoc handles these dramatically better. If you have footnote-heavy docs (legal, academic, financial), use Pandoc.
AI image alt text: Word2MD and Hyperleap generate descriptive alt text for embedded charts and screenshots. We don't. For LLM ingestion of image-heavy docs, those tools are better.
Batch processing: We're one-file-at-a-time. Word2MD and Pandoc both handle batches.
API/CLI/SDK: We're a web tool only. Pandoc has a CLI; Mammoth has a JS library; Word2MD has paid API access.
Bibliography handling: Pandoc only.

Where Pandoc wins (almost everywhere)

Pandoc is the universal document converter and it shows. Best on contracts, technical specs, theses. CLI for batch. Free, OSS, MIT-licensed, no quota. The downside: install + command line, which is a barrier for non-technical users.

Where Word2MD wins

AI image processing (descriptive alt text on charts and screenshots), Word-specific batch processing, paid tier with higher limits. The right paid choice for image-heavy Word workflows.

Where Hyperleap AI wins

End-to-end AI document platform with strong Markdown export. If you're building AI workflows that need OCR, layout analysis, and Markdown extraction in one tool, Hyperleap is competitive.

Where Mammoth.js wins

In-process JavaScript library — perfect for Node.js apps that need to convert Word docs uploaded by users. Clean semantic output. The right tool for developers building tooling, not for end users.

Picking by use case

One-off Word doc, no install, free: MDisBetter
Batch, complex docs, command line OK: Pandoc
Building Node.js tooling: Mammoth.js
Image-heavy docs needing AI alt text: Word2MD or Hyperleap
Academic / legal / footnote-heavy: Pandoc
Google Docs add-on integration: DocsToMarkdown

What about other source formats?

Most document workflows mix Word with PDFs and web pages. For those see best free PDF to Markdown converters and PDF to Markdown, and URL to Markdown. Many production pipelines use one tool for both formats because output Markdown composes cleanly downstream.

Notes on reproducibility

All five test documents are real (anonymised) docs from production use, not synthetic ones. Versions tested: Pandoc 3.5, Mammoth.js 1.8, plus latest production versions of all web tools as of May 2026. Web tools update frequently — re-running this benchmark in 6 months may shift scores by 1-2 points. The broad ranking (Pandoc, Word2MD, MDisBetter at the top tier; Mammoth/Monkt mid; ToMarkdown bottom) has been stable for over a year. See also 2026 ranked review and formatting preservation accuracy test.

How to verify these numbers yourself

Pick three of your own representative Word docs (one simple, one complex, one with the features you care about most), run them through the top 3-4 tools that match your constraints (free vs paid, web vs CLI), and score the outputs against the same four axes. Within an hour you'll have your-corpus numbers, which trump any generic benchmark — including this one.

Frequently asked questions

Why does Pandoc beat MDisBetter on contracts but tie on resumes?

Resumes are simple — heading, list, paragraph, small table. Every reasonable tool handles them. Contracts have numbered nested headings (1.2.3.a), footnotes, definitions, and clause cross-references — areas where Pandoc has invested 15+ years of edge-case handling. The simpler the doc, the smaller the gap; the more complex, the more Pandoc's depth shows.

Should I always just use Pandoc since it scored highest overall?

If you're comfortable with the command line — yes, Pandoc is the most powerful and accurate. If you want zero install and a web UI, MDisBetter or Word2MD are the right picks. If you need AI-generated alt text on charts, Word2MD or Hyperleap. Pandoc is the gold standard for accuracy; the other tools win on convenience or specific features.

How do I test conversion quality on my own Word documents?

Pick three docs that represent your real workload (one simple, one complex, one with your hardest feature). Convert each through 2-3 candidate tools. Open the resulting .md files side by side with the original Word docs. Score each on: did headings survive? Did tables survive? Did images survive? Did lists nest correctly? An hour of this gives you your-corpus data far more reliable than any generic benchmark.